From patchwork Thu Sep 9 18:47:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483653 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C988C433EF for ; Thu, 9 Sep 2021 18:47:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58F2A610FF for ; Thu, 9 Sep 2021 18:47:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244774AbhIIStC (ORCPT ); Thu, 9 Sep 2021 14:49:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244193AbhIISs6 (ORCPT ); Thu, 9 Sep 2021 14:48:58 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55F9EC061575 for ; Thu, 9 Sep 2021 11:47:48 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id s24so2052645wmh.4 for ; Thu, 09 Sep 2021 11:47:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=6nw2gGbgLgD/ItZe7r3+xcBp3zrwfuUcf5x285TXfC0=; b=BFzYgpkWg1tboamSe2QUC7zbdp8MD8sOU9Hw3IK4grBTQACrNDuRb77p5wKnaBzv5Z fgRtJzM/XIp+I03j8t9jYUsT3u7S494t5yajhtvEbRLL3eDnlQQgkQKx6R8r8MBVD8f+ NICbpDHgBnsxvXNauQ8jhFS/eiIGIkXYCUnqRz4Wey8tfeB6BOIXr88Jp4J2mY9k/kRk adTbgONNiIi3h1miGzISRGJ1T29VzngYndHaAusfkRbK/h2dUoeJ5+TfQkUaa/vjILUn uujBhFaCRucyzw//uipZk3iGcEb2AJAHaZLC27U1YxzLaG02gJcO9dvVb5WurVHt+6N9 jgbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=6nw2gGbgLgD/ItZe7r3+xcBp3zrwfuUcf5x285TXfC0=; b=vAEGfbr/Ysf143mtRyLFpSoXjexCN+qFF6DL20bV84tA/mX5OOVEiTaSJTCsMhpKeQ kcegeF4wU5xktb1LwK0KAPY9JCyO4W8asDtk/gxVkKbRdgLWgE4Tuv9ASgNnyE1KQe3M f1SNbb0TnGrYa3swr3HzYsy7Bl1BLZpSvdGsljTCUS02rPadS6zHCcfrFxrlfZjBO8Fq OUPtB0F+VFh8OJNHGruj7dwfTDxor3XNaf3CAuJisjkL1YlFk4YFWViA4mI8H0YyyxX7 pNGJXxx7wrW0OBuS0iCx9xATCERh3jadhMNybwxMld0ywv28UgJvKFkAQ9awyPYRfWFM bULA== X-Gm-Message-State: AOAM531BG+JYVZKaqXGH8lHyqPuuhUwLpSCVHzbwLhWOiBCu9Quo/ZFF l/8NGS3b26txqMJlpxxUppWzKaY0yG0= X-Google-Smtp-Source: ABdhPJxVfj7sfdwsReUAivwkw0TbLg++rwwYkpCHQRpQf3FJ8ha8S7TmIvRy4QfYqqHCsx28+jPLmw== X-Received: by 2002:a7b:c114:: with SMTP id w20mr4459658wmi.80.1631213266993; Thu, 09 Sep 2021 11:47:46 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o5sm2425311wrw.17.2021.09.09.11.47.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:46 -0700 (PDT) Message-Id: <8534c69cf84e2e4d39f57d5cafa4dec22d82ee7d.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:26 +0000 Subject: [PATCH v2 01/19] hash.h: provide constants for the hash IDs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This will simplify referencing them from code that is not deeply integrated with Git, in particular, the reftable library. Signed-off-by: Han-Wen Nienhuys --- hash.h | 6 ++++++ object-file.c | 7 ++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/hash.h b/hash.h index 9e25c40e9ac..5d40368f18a 100644 --- a/hash.h +++ b/hash.h @@ -95,12 +95,18 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s /* Number of algorithms supported (including unknown). */ #define GIT_HASH_NALGOS (GIT_HASH_SHA256 + 1) +/* "sha1", big-endian */ +#define GIT_SHA1_FORMAT_ID 0x73686131 + /* The length in bytes and in hex digits of an object name (SHA-1 value). */ #define GIT_SHA1_RAWSZ 20 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ) /* The block size of SHA-1. */ #define GIT_SHA1_BLKSZ 64 +/* "s256", big-endian */ +#define GIT_SHA256_FORMAT_ID 0x73323536 + /* The length in bytes and in hex digits of an object name (SHA-256 value). */ #define GIT_SHA256_RAWSZ 32 #define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ) diff --git a/object-file.c b/object-file.c index a8be8994814..7bfd5e6e2e9 100644 --- a/object-file.c +++ b/object-file.c @@ -164,7 +164,6 @@ static void git_hash_unknown_final_oid(struct object_id *oid, git_hash_ctx *ctx) BUG("trying to finalize unknown hash"); } - const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { { NULL, @@ -183,8 +182,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { }, { "sha1", - /* "sha1", big-endian */ - 0x73686131, + GIT_SHA1_FORMAT_ID, GIT_SHA1_RAWSZ, GIT_SHA1_HEXSZ, GIT_SHA1_BLKSZ, @@ -199,8 +197,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { }, { "sha256", - /* "s256", big-endian */ - 0x73323536, + GIT_SHA256_FORMAT_ID, GIT_SHA256_RAWSZ, GIT_SHA256_HEXSZ, GIT_SHA256_BLKSZ, From patchwork Thu Sep 9 18:47:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C516C433FE for ; Thu, 9 Sep 2021 18:47:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 24B2561108 for ; Thu, 9 Sep 2021 18:47:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244821AbhIIStC (ORCPT ); Thu, 9 Sep 2021 14:49:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242059AbhIISs6 (ORCPT ); Thu, 9 Sep 2021 14:48:58 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCE32C061757 for ; Thu, 9 Sep 2021 11:47:48 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id j17-20020a05600c1c1100b002e754875260so2125884wms.4 for ; Thu, 09 Sep 2021 11:47:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yj9tx1HJN1gjVb7BbFdzwGzB8sZO/EgnY23zy/aVzOc=; b=eI1EYohSzVG1rQPVjXAS9p+Ridub4zfq42lo/lAzSDI83jwk8tgeCrEKbWhrC+P7dI owyqkyhDfIUFdT6I9NyERn5/El7GIstmjIJCuTAUNVQZwrseFPl7Vi0FOiQwYMlzem2p gWL/1YUi+Ch8ncd4Xs6h61gMCzSo0cSnGmexfHOF8Muoq2Zr9sTU99Y0ILlggLvfpsFh dFAIHcYPyasvaTyi8dDTbmkr2kV0KQpljVTKSMgxjjCVnbMKkFcVVDGnkMYfuWymkJsJ 423FVyNjWoPoa/ZIARdF0BgaxJMhAeAP2ALpvWHTw9+z4AP0xhHE+EtW2DKacGc0sqNL c2xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yj9tx1HJN1gjVb7BbFdzwGzB8sZO/EgnY23zy/aVzOc=; b=hJSUnUVJAmYAu5Ni+uTXvCDBlXN69kUARwBplik01ZyO8XkPdnmsm7mrD3Fy/tWfbL sS7Svn5UyGzEmkBbWpYTObmHk1TSq150hPkclFtNHZ/jPV2VqcQTo5OrOtOVyKTpcs5M 7qpZADBKBb2ji6ZR6qLnWe2ALjOYKV6Wh7mR2QzkKk0GrGeqx6n5wYCFsXENxcSRVBVq W9FEpsMdnTX2EfB5QeFuQ271ubGSgO4mZ0nwiwJMb3kC+TdhxY9Wj+8IzDHDsbYw0wkL pWxdUdNT9qNJ5nCohJkdB/k/MmloyxbWje43NAbbg9jLYaKcDj9LgXs/8ksOFeVzewOr 7Pcg== X-Gm-Message-State: AOAM530ugb01oWUVS820SIBELgvs4bANxDqfsosaRWNWzfGZ1KqP58F7 ze+DnNTY6HVUcmt2TWRDVTZJXgytCk4= X-Google-Smtp-Source: ABdhPJzBRUPFNGiOUm6KWYQpe9rf0Y82sA8pcPmsE2eNZ+1/Go3Ghm1SnezAElBhQHrydpJm/gNwIw== X-Received: by 2002:a05:600c:1990:: with SMTP id t16mr4537728wmq.45.1631213267569; Thu, 09 Sep 2021 11:47:47 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v9sm2197238wml.46.2021.09.09.11.47.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:47 -0700 (PDT) Message-Id: <52968d8e83f392d3a12a6fb8483206aecb3ccf1c.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:27 +0000 Subject: [PATCH v2 02/19] reftable: RFC: add LICENSE Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The objective of this code is to be usable as a C library, so it can be reused in libgit2. This is currently using a BSD license as it is the liberal license I could find, but this could be changed to whatever fits the stated goal above. This code is currently imported from github.com/hanwen/reftable. Once this code lands in git.git, the C code will be removed from github.com/hanwen/reftable, and the git.git code will be the source of truth. Signed-off-by: Han-Wen Nienhuys --- reftable/LICENSE | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 reftable/LICENSE diff --git a/reftable/LICENSE b/reftable/LICENSE new file mode 100644 index 00000000000..402e0f9356b --- /dev/null +++ b/reftable/LICENSE @@ -0,0 +1,31 @@ +BSD License + +Copyright (c) 2020, Google LLC +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, +this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +* Neither the name of Google LLC nor the names of its contributors may +be used to endorse or promote products derived from this software +without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. From patchwork Thu Sep 9 18:47:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0904CC4332F for ; Thu, 9 Sep 2021 18:47:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E737B61100 for ; Thu, 9 Sep 2021 18:47:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244877AbhIIStD (ORCPT ); Thu, 9 Sep 2021 14:49:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244349AbhIISs7 (ORCPT ); Thu, 9 Sep 2021 14:48:59 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D19F9C061575 for ; Thu, 9 Sep 2021 11:47:49 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id q26so3937126wrc.7 for ; Thu, 09 Sep 2021 11:47:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=/Izbl9wexLtHbO918dhEysjpUsFaAvsDqZEji3I0p88=; b=UsvHFXPJ/GC6aCd6ijXllYLhznoxGaXvN8thoR7JWZ4uoYvR/u6whjkCpwW9sk6Wts HvMQdqYk6JkG6N4uAG2h25IBmJJASA73sCIoSiVsM7XtHt9JSF8dzZb0E2dNu5x0trni Z/81Dm6nIPC4cugnCNT/5BlhdUTfZurnZWuH0hle/iYjTdpGH2mubEbJoY84Z9Yh7q6N G+U9eWrXDSJLJBUbnsZPHqoOWGgj2cqicPbo3Us7z5diVDdY21KmeTYSE2uUORPXK6Hg 1zrlULVyNBrBd7xgJ1Gtt863bBi5+H5tMlD1/Y00Z3JkPVlsXJFlb0Pk9oPnvxqdOwTd r5UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=/Izbl9wexLtHbO918dhEysjpUsFaAvsDqZEji3I0p88=; b=uLTRisF6V5qBIGBfUIniOiLCP3UMkhRkMK06K2I9XHUaaNNqrj0p0AlmBUgwxo4gmn Yp1XcFkb5cHmh+Q1rkRH7zmcYf5vQCbNS8hXwl/wYjCmCxRQk9H1VSo7MqM0HDY62e0+ XCuiefpVNXjwh/OVtzylZqTFBiFwIn0ufbSdn/NsZ79ac8/BqR23lkKLRCQ+WB6UYEG2 PHnTRVCacN/x+yaIMHC6IRT6CaS3GcsLKLSQ96+UzidcozsD6dfDsb/OdnM3YmDXDyEh MmcigAm4E7uecEkhACFYEPsOZzL4JVDjLhuak3Jb/R/mgdCrIpQ/3dFc0SDEEo53Mq/U 4RDg== X-Gm-Message-State: AOAM532M4eJ2erfcoG5QKkThNS8TTTyTeAukVmQiMlMpibC+2mntM+lL Hav9bz1SiM3sXapsSvlTFvGj3KpZ32g= X-Google-Smtp-Source: ABdhPJxeWh1bRuCLknJ+YwAPUM78Vza0xVPfSS07w+Hl2mEn9XGqU4/D7hnXY/rlZbmL5sIDki9cig== X-Received: by 2002:adf:b748:: with SMTP id n8mr5393819wre.133.1631213268116; Thu, 09 Sep 2021 11:47:48 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r27sm2518106wrr.70.2021.09.09.11.47.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:47 -0700 (PDT) Message-Id: <84c8b396df53c23daeeb0ef4c6ef263724413a9e.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:28 +0000 Subject: [PATCH v2 03/19] reftable: add error related functionality Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The reftable/ directory is structured as a library, so it cannot crash on misuse. Instead, it returns an error codes. In addition, the error code can be used to signal conditions from lower levels of the library to be handled by higher levels of the library. For example, a transaction might legitimately write an empty reftable file, but in that case, we'd want to shortcut the transaction overhead. Signed-off-by: Han-Wen Nienhuys --- reftable/error.c | 41 ++++++++++++++++++++++++++ reftable/reftable-error.h | 62 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+) create mode 100644 reftable/error.c create mode 100644 reftable/reftable-error.h diff --git a/reftable/error.c b/reftable/error.c new file mode 100644 index 00000000000..f6f16def921 --- /dev/null +++ b/reftable/error.c @@ -0,0 +1,41 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "reftable-error.h" + +#include + +const char *reftable_error_str(int err) +{ + static char buf[250]; + switch (err) { + case REFTABLE_IO_ERROR: + return "I/O error"; + case REFTABLE_FORMAT_ERROR: + return "corrupt reftable file"; + case REFTABLE_NOT_EXIST_ERROR: + return "file does not exist"; + case REFTABLE_LOCK_ERROR: + return "data is outdated"; + case REFTABLE_API_ERROR: + return "misuse of the reftable API"; + case REFTABLE_ZLIB_ERROR: + return "zlib failure"; + case REFTABLE_NAME_CONFLICT: + return "file/directory conflict"; + case REFTABLE_EMPTY_TABLE_ERROR: + return "wrote empty table"; + case REFTABLE_REFNAME_ERROR: + return "invalid refname"; + case -1: + return "general error"; + default: + snprintf(buf, sizeof(buf), "unknown error code %d", err); + return buf; + } +} diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h new file mode 100644 index 00000000000..6f89bedf1a5 --- /dev/null +++ b/reftable/reftable-error.h @@ -0,0 +1,62 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_ERROR_H +#define REFTABLE_ERROR_H + +/* + * Errors in reftable calls are signaled with negative integer return values. 0 + * means success. + */ +enum reftable_error { + /* Unexpected file system behavior */ + REFTABLE_IO_ERROR = -2, + + /* Format inconsistency on reading data */ + REFTABLE_FORMAT_ERROR = -3, + + /* File does not exist. Returned from block_source_from_file(), because + * it needs special handling in stack. + */ + REFTABLE_NOT_EXIST_ERROR = -4, + + /* Trying to write out-of-date data. */ + REFTABLE_LOCK_ERROR = -5, + + /* Misuse of the API: + * - on writing a record with NULL refname. + * - on writing a reftable_ref_record outside the table limits + * - on writing a ref or log record before the stack's + * next_update_inde*x + * - on writing a log record with multiline message with + * exact_log_message unset + * - on reading a reftable_ref_record from log iterator, or vice versa. + * + * When a call misuses the API, the internal state of the library is + * kept unchanged. + */ + REFTABLE_API_ERROR = -6, + + /* Decompression error */ + REFTABLE_ZLIB_ERROR = -7, + + /* Wrote a table without blocks. */ + REFTABLE_EMPTY_TABLE_ERROR = -8, + + /* Dir/file conflict. */ + REFTABLE_NAME_CONFLICT = -9, + + /* Invalid ref name. */ + REFTABLE_REFNAME_ERROR = -10, +}; + +/* convert the numeric error code to a string. The string should not be + * deallocated. */ +const char *reftable_error_str(int err); + +#endif From patchwork Thu Sep 9 18:47:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5A5AC433EF for ; Thu, 9 Sep 2021 18:48:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F10A61100 for ; Thu, 9 Sep 2021 18:48:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244336AbhIISt2 (ORCPT ); Thu, 9 Sep 2021 14:49:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244469AbhIIStA (ORCPT ); Thu, 9 Sep 2021 14:49:00 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 358EEC06175F for ; Thu, 9 Sep 2021 11:47:50 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id t18so3980993wrb.0 for ; Thu, 09 Sep 2021 11:47:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ae4MHo7JtDs9a+PsJzP8Nl6SjHFhyQr11bkmy7HliXI=; b=q5w9pIRm7ORzTffG3wWxQHi7LFTpETcNSWmGWgwtrtQBGOB8ZfgB4u/3sF5sg4HKtY UehGak3aaIg8MzZ/zvz23Z1Fz07VCbHb1v6EhUsCWM0WLgsocqb5IKxNNNKFIL9QPjF2 6euIm0JW4JVEDHt47t/NhbVr+vZ0DHbee+wy8NfIJftTf2Xun1Mt72ehDrhX7s3sa060 L1Yk6AJne4Yut15dPj/8o88g9dCCsezqI3Z5hwhCEblT+bIl6uXB9virTAdS7AECeLD+ hXsODKfuZr6rdS48K+eBTNSXKRcjah1U9OUGe7r0gHElgAebut1OsCCfdCpAsQZ7lGxN bDbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ae4MHo7JtDs9a+PsJzP8Nl6SjHFhyQr11bkmy7HliXI=; b=c300XycNmJ1IvDNY/C/5F2DVVgP9tQzQefivo5bA9tuwptHinhC9evQijJSdILV+jG F3+R93R9WMsUa0dwg+BRFDvExsaGENn29Snb/+76ibvm5jg4aMasaAjzswmKqX6/3cKy 5vSVVzwsNAL/LC1vNzdR0+VSC2Q23SiEBQjBk2koJVmPa30nxGM4VPbF3Lh0XeJ0ga+c bLaNNU8GfKVPQgAENeNSI4ZK8jfGkVr19uOi/UrKTydHmrDXHla6PoFIL7WPCB7YsOjr t9+u5lBUAUBbLFxPHjp9DMam7/9BOykCEqqR0lMv/Gj8IyuLcbj22ngY1VOocecvP5cd Brug== X-Gm-Message-State: AOAM531I0vLJwDhMVvXYUflD00tlSBxFoCrrz8HigN8gWVjWIJBjXGdF zslhi2M7XXCLpYcNe843rN6JmuHpdu0= X-Google-Smtp-Source: ABdhPJx9qygcU4GAjoNqQHNrr2u/6O8rnKaiCK8sXubc7UgjiWwuclXPCQe/e9DUhpasIx5+9vhuBg== X-Received: by 2002:adf:a197:: with SMTP id u23mr5301502wru.253.1631213268732; Thu, 09 Sep 2021 11:47:48 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id p4sm2244312wmc.11.2021.09.09.11.47.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:48 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:29 +0000 Subject: [PATCH v2 04/19] reftable: utility functions Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This commit provides basic utility classes for the reftable library. Signed-off-by: Han-Wen Nienhuys Helped-by: Johannes Schindelin --- Makefile | 25 +++- contrib/buildsystems/CMakeLists.txt | 14 ++- contrib/buildsystems/Generators/Vcxproj.pm | 11 +- reftable/basics.c | 128 +++++++++++++++++++++ reftable/basics.h | 60 ++++++++++ reftable/basics_test.c | 98 ++++++++++++++++ reftable/publicbasics.c | 58 ++++++++++ reftable/reftable-malloc.h | 18 +++ reftable/reftable-tests.h | 22 ++++ reftable/system.h | 32 ++++++ reftable/test_framework.c | 23 ++++ reftable/test_framework.h | 53 +++++++++ t/helper/test-reftable.c | 9 ++ t/helper/test-tool.c | 3 +- t/helper/test-tool.h | 1 + t/t0032-reftable-unittest.sh | 15 +++ 16 files changed, 563 insertions(+), 7 deletions(-) create mode 100644 reftable/basics.c create mode 100644 reftable/basics.h create mode 100644 reftable/basics_test.c create mode 100644 reftable/publicbasics.c create mode 100644 reftable/reftable-malloc.h create mode 100644 reftable/reftable-tests.h create mode 100644 reftable/system.h create mode 100644 reftable/test_framework.c create mode 100644 reftable/test_framework.h create mode 100644 t/helper/test-reftable.c create mode 100755 t/t0032-reftable-unittest.sh diff --git a/Makefile b/Makefile index 429c276058d..0bf5e49aa61 100644 --- a/Makefile +++ b/Makefile @@ -743,6 +743,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o TEST_BUILTINS_OBJS += test-read-graph.o TEST_BUILTINS_OBJS += test-read-midx.o TEST_BUILTINS_OBJS += test-ref-store.o +TEST_BUILTINS_OBJS += test-reftable.o TEST_BUILTINS_OBJS += test-regex.o TEST_BUILTINS_OBJS += test-repository.o TEST_BUILTINS_OBJS += test-revision-walking.o @@ -821,6 +822,8 @@ TEST_SHELL_PATH = $(SHELL_PATH) LIB_FILE = libgit.a XDIFF_LIB = xdiff/lib.a +REFTABLE_LIB = reftable/libreftable.a +REFTABLE_TEST_LIB = reftable/libreftable_test.a GENERATED_H += command-list.h GENERATED_H += config-list.h @@ -1195,7 +1198,7 @@ THIRD_PARTY_SOURCES += compat/regex/% THIRD_PARTY_SOURCES += sha1collisiondetection/% THIRD_PARTY_SOURCES += sha1dc/% -GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) +GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB) EXTLIBS = GIT_USER_AGENT = git/$(GIT_VERSION) @@ -2446,7 +2449,15 @@ XDIFF_OBJS += xdiff/xutils.o .PHONY: xdiff-objs xdiff-objs: $(XDIFF_OBJS) +REFTABLE_OBJS += reftable/basics.o +REFTABLE_OBJS += reftable/error.o +REFTABLE_OBJS += reftable/publicbasics.o + +REFTABLE_TEST_OBJS += reftable/test_framework.o +REFTABLE_TEST_OBJS += reftable/basics_test.o + TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS)) + .PHONY: test-objs test-objs: $(TEST_OBJS) @@ -2462,6 +2473,8 @@ OBJECTS += $(PROGRAM_OBJS) OBJECTS += $(TEST_OBJS) OBJECTS += $(XDIFF_OBJS) OBJECTS += $(FUZZ_OBJS) +OBJECTS += $(REFTABLE_OBJS) $(REFTABLE_TEST_OBJS) + ifndef NO_CURL OBJECTS += http.o http-walker.o remote-curl.o endif @@ -2612,6 +2625,12 @@ $(LIB_FILE): $(LIB_OBJS) $(XDIFF_LIB): $(XDIFF_OBJS) $(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^ +$(REFTABLE_LIB): $(REFTABLE_OBJS) + $(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^ + +$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS) + $(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^ + export DEFAULT_EDITOR DEFAULT_PAGER Documentation/GIT-EXCLUDED-PROGRAMS: FORCE @@ -2904,7 +2923,7 @@ perf: all t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS)) -t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) +t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB) $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS) check-sha1:: t/helper/test-tool$X @@ -3234,7 +3253,7 @@ cocciclean: clean: profile-clean coverage-clean cocciclean $(RM) *.res $(RM) $(OBJECTS) - $(RM) $(LIB_FILE) $(XDIFF_LIB) + $(RM) $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB) $(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X $(RM) $(TEST_PROGRAMS) $(RM) $(FUZZ_PROGRAMS) diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt index 171b4124afe..c2bf5bdffc6 100644 --- a/contrib/buildsystems/CMakeLists.txt +++ b/contrib/buildsystems/CMakeLists.txt @@ -640,6 +640,12 @@ parse_makefile_for_sources(libxdiff_SOURCES "XDIFF_OBJS") list(TRANSFORM libxdiff_SOURCES PREPEND "${CMAKE_SOURCE_DIR}/") add_library(xdiff STATIC ${libxdiff_SOURCES}) +#reftable +parse_makefile_for_sources(reftable_SOURCES "REFTABLE_OBJS") + +list(TRANSFORM reftable_SOURCES PREPEND "${CMAKE_SOURCE_DIR}/") +add_library(reftable STATIC ${reftable_SOURCES}) + if(WIN32) if(NOT MSVC)#use windres when compiling with gcc and clang add_custom_command(OUTPUT ${CMAKE_BINARY_DIR}/git.res @@ -662,7 +668,7 @@ endif() #link all required libraries to common-main add_library(common-main OBJECT ${CMAKE_SOURCE_DIR}/common-main.c) -target_link_libraries(common-main libgit xdiff ${ZLIB_LIBRARIES}) +target_link_libraries(common-main libgit xdiff reftable ${ZLIB_LIBRARIES}) if(Intl_FOUND) target_link_libraries(common-main ${Intl_LIBRARIES}) endif() @@ -902,11 +908,15 @@ if(BUILD_TESTING) add_executable(test-fake-ssh ${CMAKE_SOURCE_DIR}/t/helper/test-fake-ssh.c) target_link_libraries(test-fake-ssh common-main) +#reftable-tests +parse_makefile_for_sources(test-reftable_SOURCES "REFTABLE_TEST_OBJS") +list(TRANSFORM test-reftable_SOURCES PREPEND "${CMAKE_SOURCE_DIR}/") + #test-tool parse_makefile_for_sources(test-tool_SOURCES "TEST_BUILTINS_OBJS") list(TRANSFORM test-tool_SOURCES PREPEND "${CMAKE_SOURCE_DIR}/t/helper/") -add_executable(test-tool ${CMAKE_SOURCE_DIR}/t/helper/test-tool.c ${test-tool_SOURCES}) +add_executable(test-tool ${CMAKE_SOURCE_DIR}/t/helper/test-tool.c ${test-tool_SOURCES} ${test-reftable_SOURCES}) target_link_libraries(test-tool common-main) set_target_properties(test-fake-ssh test-tool diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm index d2584450ba1..1a25789d285 100644 --- a/contrib/buildsystems/Generators/Vcxproj.pm +++ b/contrib/buildsystems/Generators/Vcxproj.pm @@ -77,7 +77,7 @@ sub createProject { my $libs_release = "\n "; my $libs_debug = "\n "; if (!$static_library) { - $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}})); + $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}})); $libs_debug = $libs_release; $libs_debug =~ s/zlib\.lib/zlibd\.lib/g; $libs_debug =~ s/libexpat\.lib/libexpatd\.lib/g; @@ -232,6 +232,7 @@ EOM EOM if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') { my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"}; + my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"}; my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"}; print F << "EOM"; @@ -241,6 +242,14 @@ EOM false EOM + if (!($name =~ /xdiff|libreftable/)) { + print F << "EOM"; + + $uuid_libreftable + false + +EOM + } if (!($name =~ 'xdiff')) { print F << "EOM"; diff --git a/reftable/basics.c b/reftable/basics.c new file mode 100644 index 00000000000..f761e48028c --- /dev/null +++ b/reftable/basics.c @@ -0,0 +1,128 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "basics.h" + +void put_be24(uint8_t *out, uint32_t i) +{ + out[0] = (uint8_t)((i >> 16) & 0xff); + out[1] = (uint8_t)((i >> 8) & 0xff); + out[2] = (uint8_t)(i & 0xff); +} + +uint32_t get_be24(uint8_t *in) +{ + return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 | + (uint32_t)(in[2]); +} + +void put_be16(uint8_t *out, uint16_t i) +{ + out[0] = (uint8_t)((i >> 8) & 0xff); + out[1] = (uint8_t)(i & 0xff); +} + +int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args) +{ + size_t lo = 0; + size_t hi = sz; + + /* Invariants: + * + * (hi == sz) || f(hi) == true + * (lo == 0 && f(0) == true) || fi(lo) == false + */ + while (hi - lo > 1) { + size_t mid = lo + (hi - lo) / 2; + + if (f(mid, args)) + hi = mid; + else + lo = mid; + } + + if (lo) + return hi; + + return f(0, args) ? 0 : 1; +} + +void free_names(char **a) +{ + char **p; + if (!a) { + return; + } + for (p = a; *p; p++) { + reftable_free(*p); + } + reftable_free(a); +} + +int names_length(char **names) +{ + char **p = names; + for (; *p; p++) { + /* empty */ + } + return p - names; +} + +void parse_names(char *buf, int size, char ***namesp) +{ + char **names = NULL; + size_t names_cap = 0; + size_t names_len = 0; + + char *p = buf; + char *end = buf + size; + while (p < end) { + char *next = strchr(p, '\n'); + if (next && next < end) { + *next = 0; + } else { + next = end; + } + if (p < next) { + if (names_len == names_cap) { + names_cap = 2 * names_cap + 1; + names = reftable_realloc( + names, names_cap * sizeof(*names)); + } + names[names_len++] = xstrdup(p); + } + p = next + 1; + } + + names = reftable_realloc(names, (names_len + 1) * sizeof(*names)); + names[names_len] = NULL; + *namesp = names; +} + +int names_equal(char **a, char **b) +{ + int i = 0; + for (; a[i] && b[i]; i++) { + if (strcmp(a[i], b[i])) { + return 0; + } + } + + return a[i] == b[i]; +} + +int common_prefix_size(struct strbuf *a, struct strbuf *b) +{ + int p = 0; + for (; p < a->len && p < b->len; p++) { + if (a->buf[p] != b->buf[p]) + break; + } + + return p; +} diff --git a/reftable/basics.h b/reftable/basics.h new file mode 100644 index 00000000000..096b36862b9 --- /dev/null +++ b/reftable/basics.h @@ -0,0 +1,60 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef BASICS_H +#define BASICS_H + +/* + * miscellaneous utilities that are not provided by Git. + */ + +#include "system.h" + +/* Bigendian en/decoding of integers */ + +void put_be24(uint8_t *out, uint32_t i); +uint32_t get_be24(uint8_t *in); +void put_be16(uint8_t *out, uint16_t i); + +/* + * find smallest index i in [0, sz) at which f(i) is true, assuming + * that f is ascending. Return sz if f(i) is false for all indices. + * + * Contrary to bsearch(3), this returns something useful if the argument is not + * found. + */ +int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args); + +/* + * Frees a NULL terminated array of malloced strings. The array itself is also + * freed. + */ +void free_names(char **a); + +/* parse a newline separated list of names. `size` is the length of the buffer, + * without terminating '\0'. Empty names are discarded. */ +void parse_names(char *buf, int size, char ***namesp); + +/* compares two NULL-terminated arrays of strings. */ +int names_equal(char **a, char **b); + +/* returns the array size of a NULL-terminated array of strings. */ +int names_length(char **names); + +/* Allocation routines; they invoke the functions set through + * reftable_set_alloc() */ +void *reftable_malloc(size_t sz); +void *reftable_realloc(void *p, size_t sz); +void reftable_free(void *p); +void *reftable_calloc(size_t sz); + +/* Find the longest shared prefix size of `a` and `b` */ +struct strbuf; +int common_prefix_size(struct strbuf *a, struct strbuf *b); + +#endif diff --git a/reftable/basics_test.c b/reftable/basics_test.c new file mode 100644 index 00000000000..1fcd2297256 --- /dev/null +++ b/reftable/basics_test.c @@ -0,0 +1,98 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" + +#include "basics.h" +#include "test_framework.h" +#include "reftable-tests.h" + +struct binsearch_args { + int key; + int *arr; +}; + +static int binsearch_func(size_t i, void *void_args) +{ + struct binsearch_args *args = void_args; + + return args->key < args->arr[i]; +} + +static void test_binsearch(void) +{ + int arr[] = { 2, 4, 6, 8, 10 }; + size_t sz = ARRAY_SIZE(arr); + struct binsearch_args args = { + .arr = arr, + }; + + int i = 0; + for (i = 1; i < 11; i++) { + int res; + args.key = i; + res = binsearch(sz, &binsearch_func, &args); + + if (res < sz) { + EXPECT(args.key < arr[res]); + if (res > 0) { + EXPECT(args.key >= arr[res - 1]); + } + } else { + EXPECT(args.key == 10 || args.key == 11); + } + } +} + +static void test_names_length(void) +{ + char *a[] = { "a", "b", NULL }; + EXPECT(names_length(a) == 2); +} + +static void test_parse_names_normal(void) +{ + char in[] = "a\nb\n"; + char **out = NULL; + parse_names(in, strlen(in), &out); + EXPECT(!strcmp(out[0], "a")); + EXPECT(!strcmp(out[1], "b")); + EXPECT(!out[2]); + free_names(out); +} + +static void test_parse_names_drop_empty(void) +{ + char in[] = "a\n\n"; + char **out = NULL; + parse_names(in, strlen(in), &out); + EXPECT(!strcmp(out[0], "a")); + EXPECT(!out[1]); + free_names(out); +} + +static void test_common_prefix(void) +{ + struct strbuf s1 = STRBUF_INIT; + struct strbuf s2 = STRBUF_INIT; + strbuf_addstr(&s1, "abcdef"); + strbuf_addstr(&s2, "abc"); + EXPECT(common_prefix_size(&s1, &s2) == 3); + strbuf_release(&s1); + strbuf_release(&s2); +} + +int basics_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_common_prefix); + RUN_TEST(test_parse_names_normal); + RUN_TEST(test_parse_names_drop_empty); + RUN_TEST(test_binsearch); + RUN_TEST(test_names_length); + return 0; +} diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c new file mode 100644 index 00000000000..bd0a02d3f68 --- /dev/null +++ b/reftable/publicbasics.c @@ -0,0 +1,58 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "reftable-malloc.h" + +#include "basics.h" +#include "system.h" + +static void *(*reftable_malloc_ptr)(size_t sz) = &malloc; +static void *(*reftable_realloc_ptr)(void *, size_t) = &realloc; +static void (*reftable_free_ptr)(void *) = &free; + +void *reftable_malloc(size_t sz) +{ + return (*reftable_malloc_ptr)(sz); +} + +void *reftable_realloc(void *p, size_t sz) +{ + return (*reftable_realloc_ptr)(p, sz); +} + +void reftable_free(void *p) +{ + reftable_free_ptr(p); +} + +void *reftable_calloc(size_t sz) +{ + void *p = reftable_malloc(sz); + memset(p, 0, sz); + return p; +} + +void reftable_set_alloc(void *(*malloc)(size_t), + void *(*realloc)(void *, size_t), void (*free)(void *)) +{ + reftable_malloc_ptr = malloc; + reftable_realloc_ptr = realloc; + reftable_free_ptr = free; +} + +int hash_size(uint32_t id) +{ + switch (id) { + case 0: + case GIT_SHA1_FORMAT_ID: + return GIT_SHA1_RAWSZ; + case GIT_SHA256_FORMAT_ID: + return GIT_SHA256_RAWSZ; + } + abort(); +} diff --git a/reftable/reftable-malloc.h b/reftable/reftable-malloc.h new file mode 100644 index 00000000000..5f2185f1f34 --- /dev/null +++ b/reftable/reftable-malloc.h @@ -0,0 +1,18 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_H +#define REFTABLE_H + +#include + +/* Overrides the functions to use for memory management. */ +void reftable_set_alloc(void *(*malloc)(size_t), + void *(*realloc)(void *, size_t), void (*free)(void *)); + +#endif diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h new file mode 100644 index 00000000000..5e7698ae654 --- /dev/null +++ b/reftable/reftable-tests.h @@ -0,0 +1,22 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_TESTS_H +#define REFTABLE_TESTS_H + +int basics_test_main(int argc, const char **argv); +int block_test_main(int argc, const char **argv); +int merged_test_main(int argc, const char **argv); +int record_test_main(int argc, const char **argv); +int refname_test_main(int argc, const char **argv); +int reftable_test_main(int argc, const char **argv); +int stack_test_main(int argc, const char **argv); +int tree_test_main(int argc, const char **argv); +int reftable_dump_main(int argc, char *const *argv); + +#endif diff --git a/reftable/system.h b/reftable/system.h new file mode 100644 index 00000000000..4907306c0c5 --- /dev/null +++ b/reftable/system.h @@ -0,0 +1,32 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef SYSTEM_H +#define SYSTEM_H + +/* This header glues the reftable library to the rest of Git */ + +#include "git-compat-util.h" +#include "strbuf.h" +#include "hash.h" /* hash ID, sizes.*/ +#include "dir.h" /* remove_dir_recursively, for tests.*/ + +#include + +#ifdef NO_UNCOMPRESS2 +/* + * This is uncompress2, which is only available in zlib >= 1.2.9 + * (released as of early 2017) + */ +int uncompress2(Bytef *dest, uLongf *destLen, const Bytef *source, + uLong *sourceLen); +#endif + +int hash_size(uint32_t id); + +#endif diff --git a/reftable/test_framework.c b/reftable/test_framework.c new file mode 100644 index 00000000000..84ac972cad0 --- /dev/null +++ b/reftable/test_framework.c @@ -0,0 +1,23 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" +#include "test_framework.h" + +#include "basics.h" + +void set_test_hash(uint8_t *p, int i) +{ + memset(p, (uint8_t)i, hash_size(GIT_SHA1_FORMAT_ID)); +} + +ssize_t strbuf_add_void(void *b, const void *data, size_t sz) +{ + strbuf_add(b, data, sz); + return sz; +} diff --git a/reftable/test_framework.h b/reftable/test_framework.h new file mode 100644 index 00000000000..774cb275bf6 --- /dev/null +++ b/reftable/test_framework.h @@ -0,0 +1,53 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef TEST_FRAMEWORK_H +#define TEST_FRAMEWORK_H + +#include "system.h" +#include "reftable-error.h" + +#define EXPECT_ERR(c) \ + if (c != 0) { \ + fflush(stderr); \ + fflush(stdout); \ + fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", \ + __FILE__, __LINE__, c, reftable_error_str(c)); \ + abort(); \ + } + +#define EXPECT_STREQ(a, b) \ + if (strcmp(a, b)) { \ + fflush(stderr); \ + fflush(stdout); \ + fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \ + __LINE__, #a, a, #b, b); \ + abort(); \ + } + +#define EXPECT(c) \ + if (!(c)) { \ + fflush(stderr); \ + fflush(stdout); \ + fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \ + __LINE__, #c); \ + abort(); \ + } + +#define RUN_TEST(f) \ + fprintf(stderr, "running %s\n", #f); \ + fflush(stderr); \ + f(); + +void set_test_hash(uint8_t *p, int i); + +/* Like strbuf_add, but suitable for passing to reftable_new_writer + */ +ssize_t strbuf_add_void(void *b, const void *data, size_t sz); + +#endif diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c new file mode 100644 index 00000000000..3b58e423e7b --- /dev/null +++ b/t/helper/test-reftable.c @@ -0,0 +1,9 @@ +#include "reftable/reftable-tests.h" +#include "test-tool.h" + +int cmd__reftable(int argc, const char **argv) +{ + basics_test_main(argc, argv); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 3ce5585e53a..f7c888ffda7 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -53,13 +53,14 @@ static struct test_cmd cmds[] = { { "pcre2-config", cmd__pcre2_config }, { "pkt-line", cmd__pkt_line }, { "prio-queue", cmd__prio_queue }, - { "proc-receive", cmd__proc_receive}, + { "proc-receive", cmd__proc_receive }, { "progress", cmd__progress }, { "reach", cmd__reach }, { "read-cache", cmd__read_cache }, { "read-graph", cmd__read_graph }, { "read-midx", cmd__read_midx }, { "ref-store", cmd__ref_store }, + { "reftable", cmd__reftable }, { "regex", cmd__regex }, { "repository", cmd__repository }, { "revision-walking", cmd__revision_walking }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 9f0f5228508..25f77469146 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -49,6 +49,7 @@ int cmd__read_cache(int argc, const char **argv); int cmd__read_graph(int argc, const char **argv); int cmd__read_midx(int argc, const char **argv); int cmd__ref_store(int argc, const char **argv); +int cmd__reftable(int argc, const char **argv); int cmd__regex(int argc, const char **argv); int cmd__repository(int argc, const char **argv); int cmd__revision_walking(int argc, const char **argv); diff --git a/t/t0032-reftable-unittest.sh b/t/t0032-reftable-unittest.sh new file mode 100755 index 00000000000..0ed14971a58 --- /dev/null +++ b/t/t0032-reftable-unittest.sh @@ -0,0 +1,15 @@ +#!/bin/sh +# +# Copyright (c) 2020 Google LLC +# + +test_description='reftable unittests' + +. ./test-lib.sh + +test_expect_success 'unittests' ' + TMPDIR=$(pwd) && export TMPDIR && + test-tool reftable +' + +test_done From patchwork Thu Sep 9 18:47:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D0AC433FE for ; Thu, 9 Sep 2021 18:48:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E36EA610FF for ; Thu, 9 Sep 2021 18:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245420AbhIIStp (ORCPT ); Thu, 9 Sep 2021 14:49:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244661AbhIIStB (ORCPT ); Thu, 9 Sep 2021 14:49:01 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B89ABC061766 for ; Thu, 9 Sep 2021 11:47:50 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id g16so3958510wrb.3 for ; Thu, 09 Sep 2021 11:47:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0H7sIllJ5diD/aIGOgMpHSEiTdVmkZz40GtDvbxm9BA=; b=CO4gKLYqbPEzabNnibP/oQcIQxz99hl6la3oGzm9sg8m7xOrhhR9QHd9aYHNTcrWKD A3nV6cl3FryUXy1CT9PVgff5lASB1i6unYAzx9hH7NGDC2uMcx7oUUw2alPACjlRZ1PC 6DusVkKclNt2CqXTp9YG6/MBIIxM2FxtkXJmwC7zekchzkgTVmSWFGRTuY3srNh8o7lg hAHzvTOOv9OXRZY+6Ld/SfxqDBzlJqCLWuRqTpLDSF9R8K4Olif/2dBGyLsFuE+lI75O /Fi9LFqzbvoTF7J8IESrn9Poqj86z3H4bf6SjktYuSzQgUuN8Npk2AWCnZkm1v/nR8g0 OrUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0H7sIllJ5diD/aIGOgMpHSEiTdVmkZz40GtDvbxm9BA=; b=zurkHAin5YjhSyZMWDtofriSzkYC6jAlNsD9aJFo+se0Envq1zsc3ZtYQZuOAE5LXc FZyd5NzRqC4tTH2jzUvn8rmbYXalCUU7hd7Z6oJ1Fzz/o6R3UKQUk94inSUV5xvIgmjM jYMppXnw/6xvo0QJyywnctbyr/CZGXdPFcbTuwcS8HiaiNH9HV7XU7bngS+rfn9iILBL yALk7j6awMbLSalT+8zgVYS561TgLT7GgYINEqIOXoNC+aC5PznOPT/tjTITddVulnfY bdxJaYmkWrHbXtx7b1SEBHyuoUv9umPgKox7dlW98CaB9PZlsU+4gLbh9M4C8TcCp3jk 4oPQ== X-Gm-Message-State: AOAM532AeYMkdoQ4WO+kDRqhGpdnk6d8abM/pdSUNMs344PhUUcfUjnS KW4kc/vJDXd2oQDDz2ZxjvTM6+Hvvjc= X-Google-Smtp-Source: ABdhPJykIPaewJ44O4W0l2AHBsNkewHmwmCy9in4hv+tvnI+VNgBXWHlSWW/7AaCKQDq/0t5W1fYRA== X-Received: by 2002:adf:b781:: with SMTP id s1mr5451045wre.165.1631213269355; Thu, 09 Sep 2021 11:47:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f3sm2231949wmj.28.2021.09.09.11.47.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:49 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:30 +0000 Subject: [PATCH v2 05/19] reftable: add blocksource, an abstraction for random access reads Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The reftable format is usually used with files for storage. However, we abstract away this using the blocksource data structure. This has two advantages: * log blocks are zlib compressed, and handling them is simplified if we can discard byte segments from within the block layer. * for unittests, it is useful to read and write in-memory. The blocksource allows us to abstract the data away from on-disk files. Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + reftable/blocksource.c | 148 ++++++++++++++++++++++++++++++++ reftable/blocksource.h | 22 +++++ reftable/reftable-blocksource.h | 49 +++++++++++ 4 files changed, 220 insertions(+) create mode 100644 reftable/blocksource.c create mode 100644 reftable/blocksource.h create mode 100644 reftable/reftable-blocksource.h diff --git a/Makefile b/Makefile index 0bf5e49aa61..c7ffc27a727 100644 --- a/Makefile +++ b/Makefile @@ -2451,6 +2451,7 @@ xdiff-objs: $(XDIFF_OBJS) REFTABLE_OBJS += reftable/basics.o REFTABLE_OBJS += reftable/error.o +REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o REFTABLE_TEST_OBJS += reftable/test_framework.o diff --git a/reftable/blocksource.c b/reftable/blocksource.c new file mode 100644 index 00000000000..0044eecd9aa --- /dev/null +++ b/reftable/blocksource.c @@ -0,0 +1,148 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" + +#include "basics.h" +#include "blocksource.h" +#include "reftable-blocksource.h" +#include "reftable-error.h" + +static void strbuf_return_block(void *b, struct reftable_block *dest) +{ + memset(dest->data, 0xff, dest->len); + reftable_free(dest->data); +} + +static void strbuf_close(void *b) +{ +} + +static int strbuf_read_block(void *v, struct reftable_block *dest, uint64_t off, + uint32_t size) +{ + struct strbuf *b = v; + assert(off + size <= b->len); + dest->data = reftable_calloc(size); + memcpy(dest->data, b->buf + off, size); + dest->len = size; + return size; +} + +static uint64_t strbuf_size(void *b) +{ + return ((struct strbuf *)b)->len; +} + +static struct reftable_block_source_vtable strbuf_vtable = { + .size = &strbuf_size, + .read_block = &strbuf_read_block, + .return_block = &strbuf_return_block, + .close = &strbuf_close, +}; + +void block_source_from_strbuf(struct reftable_block_source *bs, + struct strbuf *buf) +{ + assert(!bs->ops); + bs->ops = &strbuf_vtable; + bs->arg = buf; +} + +static void malloc_return_block(void *b, struct reftable_block *dest) +{ + memset(dest->data, 0xff, dest->len); + reftable_free(dest->data); +} + +static struct reftable_block_source_vtable malloc_vtable = { + .return_block = &malloc_return_block, +}; + +static struct reftable_block_source malloc_block_source_instance = { + .ops = &malloc_vtable, +}; + +struct reftable_block_source malloc_block_source(void) +{ + return malloc_block_source_instance; +} + +struct file_block_source { + int fd; + uint64_t size; +}; + +static uint64_t file_size(void *b) +{ + return ((struct file_block_source *)b)->size; +} + +static void file_return_block(void *b, struct reftable_block *dest) +{ + memset(dest->data, 0xff, dest->len); + reftable_free(dest->data); +} + +static void file_close(void *b) +{ + int fd = ((struct file_block_source *)b)->fd; + if (fd > 0) { + close(fd); + ((struct file_block_source *)b)->fd = 0; + } + + reftable_free(b); +} + +static int file_read_block(void *v, struct reftable_block *dest, uint64_t off, + uint32_t size) +{ + struct file_block_source *b = v; + assert(off + size <= b->size); + dest->data = reftable_malloc(size); + if (pread(b->fd, dest->data, size, off) != size) + return -1; + dest->len = size; + return size; +} + +static struct reftable_block_source_vtable file_vtable = { + .size = &file_size, + .read_block = &file_read_block, + .return_block = &file_return_block, + .close = &file_close, +}; + +int reftable_block_source_from_file(struct reftable_block_source *bs, + const char *name) +{ + struct stat st = { 0 }; + int err = 0; + int fd = open(name, O_RDONLY); + struct file_block_source *p = NULL; + if (fd < 0) { + if (errno == ENOENT) { + return REFTABLE_NOT_EXIST_ERROR; + } + return -1; + } + + err = fstat(fd, &st); + if (err < 0) + return -1; + + p = reftable_calloc(sizeof(struct file_block_source)); + p->size = st.st_size; + p->fd = fd; + + assert(!bs->ops); + bs->ops = &file_vtable; + bs->arg = p; + return 0; +} diff --git a/reftable/blocksource.h b/reftable/blocksource.h new file mode 100644 index 00000000000..072e2727ad2 --- /dev/null +++ b/reftable/blocksource.h @@ -0,0 +1,22 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef BLOCKSOURCE_H +#define BLOCKSOURCE_H + +#include "system.h" + +struct reftable_block_source; + +/* Create an in-memory block source for reading reftables */ +void block_source_from_strbuf(struct reftable_block_source *bs, + struct strbuf *buf); + +struct reftable_block_source malloc_block_source(void); + +#endif diff --git a/reftable/reftable-blocksource.h b/reftable/reftable-blocksource.h new file mode 100644 index 00000000000..5aa3990a573 --- /dev/null +++ b/reftable/reftable-blocksource.h @@ -0,0 +1,49 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_BLOCKSOURCE_H +#define REFTABLE_BLOCKSOURCE_H + +#include + +/* block_source is a generic wrapper for a seekable readable file. + */ +struct reftable_block_source { + struct reftable_block_source_vtable *ops; + void *arg; +}; + +/* a contiguous segment of bytes. It keeps track of its generating block_source + * so it can return itself into the pool. */ +struct reftable_block { + uint8_t *data; + int len; + struct reftable_block_source source; +}; + +/* block_source_vtable are the operations that make up block_source */ +struct reftable_block_source_vtable { + /* returns the size of a block source */ + uint64_t (*size)(void *source); + + /* reads a segment from the block source. It is an error to read + beyond the end of the block */ + int (*read_block)(void *source, struct reftable_block *dest, + uint64_t off, uint32_t size); + /* mark the block as read; may return the data back to malloc */ + void (*return_block)(void *source, struct reftable_block *blockp); + + /* release all resources associated with the block source */ + void (*close)(void *source); +}; + +/* opens a file on the file system as a block_source */ +int reftable_block_source_from_file(struct reftable_block_source *block_src, + const char *name); + +#endif From patchwork Thu Sep 9 18:47:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA55BC433EF for ; Thu, 9 Sep 2021 18:48:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A8A0610FF for ; Thu, 9 Sep 2021 18:48:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242890AbhIIStf (ORCPT ); Thu, 9 Sep 2021 14:49:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244684AbhIIStB (ORCPT ); Thu, 9 Sep 2021 14:49:01 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4A5CC061574 for ; Thu, 9 Sep 2021 11:47:51 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id b6so3933970wrh.10 for ; Thu, 09 Sep 2021 11:47:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=IfNyn3ePSsRhtmyDZx6a69shPXwaT2jVxx1RTEvwkoE=; b=cBPwJWb+eTqd+BzgqYBo0HMsLBDB4GedlZ2DibRH3/lj1LmpC4k9rFEmpyXVZ70PtJ Xb2vzmxa30V9PawB9kK1mQEwnG3KUop35tJO9b+xW/AqD8xF0/A1ihb3E+B2bqjtreDo U3bpqA9HJz/6UXchpNicNeyvyAuEQyHUEdysHZUpZqB0p76mEnMAxWSoxmHAkAQZs2DH ZMBzZSdgh9Kb6mY0CAw0VaOgxHmKnSBrY8qkLQvVYewqndYxWFSzGEpfWuXLhS7TzlmB T1tOJKkRi5qjJo9pdepQO57Ts4dRNx2DIMlJ05K3h9T3qBUJD6/kc85QPlWWVXhTs/br cEwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=IfNyn3ePSsRhtmyDZx6a69shPXwaT2jVxx1RTEvwkoE=; b=d/6Kcg0OwlNfER1jPNWyCG+1lDUGwBpt93JbbVAUY9fj0gHRX6JezVkfHzEZQgTMWS UvAYT7AFKBfGSZ6gjyA7dTY/P9/hCgzZ1ShvfutiE9z5W3x0liIcukJAQMx57BG6/KLN 7Ik59F2mkfQlOpwZTZaH9wUXDkBVICg7UGm0NTBl4fNCkA8VgGgy/EzPbZJTgx90iX6e XzYki4m1boXyLkRgOo0wXweSj9tlWmGwXVpDoVGc2xFDRj8bACwpMKDco0vy2mHOe+Ul GSIn2Lv6ElAIRCfnjMArOAhvEynE6ta/wQRjXvTTkxBB+9sDJOLm69bdY318qtiWNFoU k9ew== X-Gm-Message-State: AOAM530qn9ukRv6MzGH9zXXh3ztM+c8i4wFbz2BAokRCACS857ljO1cp ZLJnntwaS8g0a+SNZAr7ATV9jcAIowc= X-Google-Smtp-Source: ABdhPJwrEEx4bre+7r2byhqHB04g/vhWYKeLrFlg+tXltpkn5O8VN2EBgbwWR3FYEqXuFXqGhAwhVg== X-Received: by 2002:adf:9003:: with SMTP id h3mr5360234wrh.75.1631213270040; Thu, 09 Sep 2021 11:47:50 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r36sm2110075wmp.33.2021.09.09.11.47.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:49 -0700 (PDT) Message-Id: <651e195fe771a37b23140cf774435086e3f8f752.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:31 +0000 Subject: [PATCH v2 06/19] reftable: (de)serialization for the polymorphic record type. Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The reftable format is structured as a sequence of blocks, and each block contains a sequence of prefix-compressed key-value records. There are 4 types of records, and they have similarities in how they must be handled. This is achieved by introducing a polymorphic 'record' type that encapsulates ref, log, index and object records. Signed-off-by: Han-Wen Nienhuys --- Makefile | 2 + reftable/constants.h | 21 + reftable/record.c | 1212 ++++++++++++++++++++++++++++++++++++ reftable/record.h | 139 +++++ reftable/record_test.c | 412 ++++++++++++ reftable/reftable-record.h | 114 ++++ t/helper/test-reftable.c | 2 +- 7 files changed, 1901 insertions(+), 1 deletion(-) create mode 100644 reftable/constants.h create mode 100644 reftable/record.c create mode 100644 reftable/record.h create mode 100644 reftable/record_test.c create mode 100644 reftable/reftable-record.h diff --git a/Makefile b/Makefile index c7ffc27a727..14d025ea424 100644 --- a/Makefile +++ b/Makefile @@ -2453,7 +2453,9 @@ REFTABLE_OBJS += reftable/basics.o REFTABLE_OBJS += reftable/error.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o +REFTABLE_OBJS += reftable/record.o +REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o REFTABLE_TEST_OBJS += reftable/basics_test.o diff --git a/reftable/constants.h b/reftable/constants.h new file mode 100644 index 00000000000..5eee72c4c11 --- /dev/null +++ b/reftable/constants.h @@ -0,0 +1,21 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef CONSTANTS_H +#define CONSTANTS_H + +#define BLOCK_TYPE_LOG 'g' +#define BLOCK_TYPE_INDEX 'i' +#define BLOCK_TYPE_REF 'r' +#define BLOCK_TYPE_OBJ 'o' +#define BLOCK_TYPE_ANY 0 + +#define MAX_RESTARTS ((1 << 16) - 1) +#define DEFAULT_BLOCK_SIZE 4096 + +#endif diff --git a/reftable/record.c b/reftable/record.c new file mode 100644 index 00000000000..6a5dac32dc6 --- /dev/null +++ b/reftable/record.c @@ -0,0 +1,1212 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +/* record.c - methods for different types of records. */ + +#include "record.h" + +#include "system.h" +#include "constants.h" +#include "reftable-error.h" +#include "basics.h" + +int get_var_int(uint64_t *dest, struct string_view *in) +{ + int ptr = 0; + uint64_t val; + + if (in->len == 0) + return -1; + val = in->buf[ptr] & 0x7f; + + while (in->buf[ptr] & 0x80) { + ptr++; + if (ptr > in->len) { + return -1; + } + val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f); + } + + *dest = val; + return ptr + 1; +} + +int put_var_int(struct string_view *dest, uint64_t val) +{ + uint8_t buf[10] = { 0 }; + int i = 9; + int n = 0; + buf[i] = (uint8_t)(val & 0x7f); + i--; + while (1) { + val >>= 7; + if (!val) { + break; + } + val--; + buf[i] = 0x80 | (uint8_t)(val & 0x7f); + i--; + } + + n = sizeof(buf) - i - 1; + if (dest->len < n) + return -1; + memcpy(dest->buf, &buf[i + 1], n); + return n; +} + +int reftable_is_block_type(uint8_t typ) +{ + switch (typ) { + case BLOCK_TYPE_REF: + case BLOCK_TYPE_LOG: + case BLOCK_TYPE_OBJ: + case BLOCK_TYPE_INDEX: + return 1; + } + return 0; +} + +uint8_t *reftable_ref_record_val1(struct reftable_ref_record *rec) +{ + switch (rec->value_type) { + case REFTABLE_REF_VAL1: + return rec->value.val1; + case REFTABLE_REF_VAL2: + return rec->value.val2.value; + default: + return NULL; + } +} + +uint8_t *reftable_ref_record_val2(struct reftable_ref_record *rec) +{ + switch (rec->value_type) { + case REFTABLE_REF_VAL2: + return rec->value.val2.target_value; + default: + return NULL; + } +} + +static int decode_string(struct strbuf *dest, struct string_view in) +{ + int start_len = in.len; + uint64_t tsize = 0; + int n = get_var_int(&tsize, &in); + if (n <= 0) + return -1; + string_view_consume(&in, n); + if (in.len < tsize) + return -1; + + strbuf_reset(dest); + strbuf_add(dest, in.buf, tsize); + string_view_consume(&in, tsize); + + return start_len - in.len; +} + +static int encode_string(char *str, struct string_view s) +{ + struct string_view start = s; + int l = strlen(str); + int n = put_var_int(&s, l); + if (n < 0) + return -1; + string_view_consume(&s, n); + if (s.len < l) + return -1; + memcpy(s.buf, str, l); + string_view_consume(&s, l); + + return start.len - s.len; +} + +int reftable_encode_key(int *restart, struct string_view dest, + struct strbuf prev_key, struct strbuf key, + uint8_t extra) +{ + struct string_view start = dest; + int prefix_len = common_prefix_size(&prev_key, &key); + uint64_t suffix_len = key.len - prefix_len; + int n = put_var_int(&dest, (uint64_t)prefix_len); + if (n < 0) + return -1; + string_view_consume(&dest, n); + + *restart = (prefix_len == 0); + + n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra); + if (n < 0) + return -1; + string_view_consume(&dest, n); + + if (dest.len < suffix_len) + return -1; + memcpy(dest.buf, key.buf + prefix_len, suffix_len); + string_view_consume(&dest, suffix_len); + + return start.len - dest.len; +} + +int reftable_decode_key(struct strbuf *key, uint8_t *extra, + struct strbuf last_key, struct string_view in) +{ + int start_len = in.len; + uint64_t prefix_len = 0; + uint64_t suffix_len = 0; + int n = get_var_int(&prefix_len, &in); + if (n < 0) + return -1; + string_view_consume(&in, n); + + if (prefix_len > last_key.len) + return -1; + + n = get_var_int(&suffix_len, &in); + if (n <= 0) + return -1; + string_view_consume(&in, n); + + *extra = (uint8_t)(suffix_len & 0x7); + suffix_len >>= 3; + + if (in.len < suffix_len) + return -1; + + strbuf_reset(key); + strbuf_add(key, last_key.buf, prefix_len); + strbuf_add(key, in.buf, suffix_len); + string_view_consume(&in, suffix_len); + + return start_len - in.len; +} + +static void reftable_ref_record_key(const void *r, struct strbuf *dest) +{ + const struct reftable_ref_record *rec = + (const struct reftable_ref_record *)r; + strbuf_reset(dest); + strbuf_addstr(dest, rec->refname); +} + +static void reftable_ref_record_copy_from(void *rec, const void *src_rec, + int hash_size) +{ + struct reftable_ref_record *ref = rec; + const struct reftable_ref_record *src = src_rec; + assert(hash_size > 0); + + /* This is simple and correct, but we could probably reuse the hash + * fields. */ + reftable_ref_record_release(ref); + if (src->refname) { + ref->refname = xstrdup(src->refname); + } + ref->update_index = src->update_index; + ref->value_type = src->value_type; + switch (src->value_type) { + case REFTABLE_REF_DELETION: + break; + case REFTABLE_REF_VAL1: + ref->value.val1 = reftable_malloc(hash_size); + memcpy(ref->value.val1, src->value.val1, hash_size); + break; + case REFTABLE_REF_VAL2: + ref->value.val2.value = reftable_malloc(hash_size); + memcpy(ref->value.val2.value, src->value.val2.value, hash_size); + ref->value.val2.target_value = reftable_malloc(hash_size); + memcpy(ref->value.val2.target_value, + src->value.val2.target_value, hash_size); + break; + case REFTABLE_REF_SYMREF: + ref->value.symref = xstrdup(src->value.symref); + break; + } +} + +static char hexdigit(int c) +{ + if (c <= 9) + return '0' + c; + return 'a' + (c - 10); +} + +static void hex_format(char *dest, uint8_t *src, int hash_size) +{ + assert(hash_size > 0); + if (src) { + int i = 0; + for (i = 0; i < hash_size; i++) { + dest[2 * i] = hexdigit(src[i] >> 4); + dest[2 * i + 1] = hexdigit(src[i] & 0xf); + } + dest[2 * hash_size] = 0; + } +} + +void reftable_ref_record_print(struct reftable_ref_record *ref, + uint32_t hash_id) +{ + char hex[2 * GIT_SHA256_RAWSZ + 1] = { 0 }; /* BUG */ + printf("ref{%s(%" PRIu64 ") ", ref->refname, ref->update_index); + switch (ref->value_type) { + case REFTABLE_REF_SYMREF: + printf("=> %s", ref->value.symref); + break; + case REFTABLE_REF_VAL2: + hex_format(hex, ref->value.val2.value, hash_size(hash_id)); + printf("val 2 %s", hex); + hex_format(hex, ref->value.val2.target_value, + hash_size(hash_id)); + printf("(T %s)", hex); + break; + case REFTABLE_REF_VAL1: + hex_format(hex, ref->value.val1, hash_size(hash_id)); + printf("val 1 %s", hex); + break; + case REFTABLE_REF_DELETION: + printf("delete"); + break; + } + printf("}\n"); +} + +static void reftable_ref_record_release_void(void *rec) +{ + reftable_ref_record_release(rec); +} + +void reftable_ref_record_release(struct reftable_ref_record *ref) +{ + switch (ref->value_type) { + case REFTABLE_REF_SYMREF: + reftable_free(ref->value.symref); + break; + case REFTABLE_REF_VAL2: + reftable_free(ref->value.val2.target_value); + reftable_free(ref->value.val2.value); + break; + case REFTABLE_REF_VAL1: + reftable_free(ref->value.val1); + break; + case REFTABLE_REF_DELETION: + break; + default: + abort(); + } + + reftable_free(ref->refname); + memset(ref, 0, sizeof(struct reftable_ref_record)); +} + +static uint8_t reftable_ref_record_val_type(const void *rec) +{ + const struct reftable_ref_record *r = + (const struct reftable_ref_record *)rec; + return r->value_type; +} + +static int reftable_ref_record_encode(const void *rec, struct string_view s, + int hash_size) +{ + const struct reftable_ref_record *r = + (const struct reftable_ref_record *)rec; + struct string_view start = s; + int n = put_var_int(&s, r->update_index); + assert(hash_size > 0); + if (n < 0) + return -1; + string_view_consume(&s, n); + + switch (r->value_type) { + case REFTABLE_REF_SYMREF: + n = encode_string(r->value.symref, s); + if (n < 0) { + return -1; + } + string_view_consume(&s, n); + break; + case REFTABLE_REF_VAL2: + if (s.len < 2 * hash_size) { + return -1; + } + memcpy(s.buf, r->value.val2.value, hash_size); + string_view_consume(&s, hash_size); + memcpy(s.buf, r->value.val2.target_value, hash_size); + string_view_consume(&s, hash_size); + break; + case REFTABLE_REF_VAL1: + if (s.len < hash_size) { + return -1; + } + memcpy(s.buf, r->value.val1, hash_size); + string_view_consume(&s, hash_size); + break; + case REFTABLE_REF_DELETION: + break; + default: + abort(); + } + + return start.len - s.len; +} + +static int reftable_ref_record_decode(void *rec, struct strbuf key, + uint8_t val_type, struct string_view in, + int hash_size) +{ + struct reftable_ref_record *r = rec; + struct string_view start = in; + uint64_t update_index = 0; + int n = get_var_int(&update_index, &in); + if (n < 0) + return n; + string_view_consume(&in, n); + + reftable_ref_record_release(r); + + assert(hash_size > 0); + + r->refname = reftable_realloc(r->refname, key.len + 1); + memcpy(r->refname, key.buf, key.len); + r->update_index = update_index; + r->refname[key.len] = 0; + r->value_type = val_type; + switch (val_type) { + case REFTABLE_REF_VAL1: + if (in.len < hash_size) { + return -1; + } + + r->value.val1 = reftable_malloc(hash_size); + memcpy(r->value.val1, in.buf, hash_size); + string_view_consume(&in, hash_size); + break; + + case REFTABLE_REF_VAL2: + if (in.len < 2 * hash_size) { + return -1; + } + + r->value.val2.value = reftable_malloc(hash_size); + memcpy(r->value.val2.value, in.buf, hash_size); + string_view_consume(&in, hash_size); + + r->value.val2.target_value = reftable_malloc(hash_size); + memcpy(r->value.val2.target_value, in.buf, hash_size); + string_view_consume(&in, hash_size); + break; + + case REFTABLE_REF_SYMREF: { + struct strbuf dest = STRBUF_INIT; + int n = decode_string(&dest, in); + if (n < 0) { + return -1; + } + string_view_consume(&in, n); + r->value.symref = dest.buf; + } break; + + case REFTABLE_REF_DELETION: + break; + default: + abort(); + break; + } + + return start.len - in.len; +} + +static int reftable_ref_record_is_deletion_void(const void *p) +{ + return reftable_ref_record_is_deletion( + (const struct reftable_ref_record *)p); +} + +static struct reftable_record_vtable reftable_ref_record_vtable = { + .key = &reftable_ref_record_key, + .type = BLOCK_TYPE_REF, + .copy_from = &reftable_ref_record_copy_from, + .val_type = &reftable_ref_record_val_type, + .encode = &reftable_ref_record_encode, + .decode = &reftable_ref_record_decode, + .release = &reftable_ref_record_release_void, + .is_deletion = &reftable_ref_record_is_deletion_void, +}; + +static void reftable_obj_record_key(const void *r, struct strbuf *dest) +{ + const struct reftable_obj_record *rec = + (const struct reftable_obj_record *)r; + strbuf_reset(dest); + strbuf_add(dest, rec->hash_prefix, rec->hash_prefix_len); +} + +static void reftable_obj_record_release(void *rec) +{ + struct reftable_obj_record *obj = rec; + FREE_AND_NULL(obj->hash_prefix); + FREE_AND_NULL(obj->offsets); + memset(obj, 0, sizeof(struct reftable_obj_record)); +} + +static void reftable_obj_record_copy_from(void *rec, const void *src_rec, + int hash_size) +{ + struct reftable_obj_record *obj = rec; + const struct reftable_obj_record *src = + (const struct reftable_obj_record *)src_rec; + + reftable_obj_record_release(obj); + *obj = *src; + obj->hash_prefix = reftable_malloc(obj->hash_prefix_len); + memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len); + + obj->offsets = reftable_malloc(obj->offset_len * sizeof(uint64_t)); + COPY_ARRAY(obj->offsets, src->offsets, obj->offset_len); +} + +static uint8_t reftable_obj_record_val_type(const void *rec) +{ + const struct reftable_obj_record *r = rec; + if (r->offset_len > 0 && r->offset_len < 8) + return r->offset_len; + return 0; +} + +static int reftable_obj_record_encode(const void *rec, struct string_view s, + int hash_size) +{ + const struct reftable_obj_record *r = rec; + struct string_view start = s; + int i = 0; + int n = 0; + uint64_t last = 0; + if (r->offset_len == 0 || r->offset_len >= 8) { + n = put_var_int(&s, r->offset_len); + if (n < 0) { + return -1; + } + string_view_consume(&s, n); + } + if (r->offset_len == 0) + return start.len - s.len; + n = put_var_int(&s, r->offsets[0]); + if (n < 0) + return -1; + string_view_consume(&s, n); + + last = r->offsets[0]; + for (i = 1; i < r->offset_len; i++) { + int n = put_var_int(&s, r->offsets[i] - last); + if (n < 0) { + return -1; + } + string_view_consume(&s, n); + last = r->offsets[i]; + } + return start.len - s.len; +} + +static int reftable_obj_record_decode(void *rec, struct strbuf key, + uint8_t val_type, struct string_view in, + int hash_size) +{ + struct string_view start = in; + struct reftable_obj_record *r = rec; + uint64_t count = val_type; + int n = 0; + uint64_t last; + int j; + r->hash_prefix = reftable_malloc(key.len); + memcpy(r->hash_prefix, key.buf, key.len); + r->hash_prefix_len = key.len; + + if (val_type == 0) { + n = get_var_int(&count, &in); + if (n < 0) { + return n; + } + + string_view_consume(&in, n); + } + + r->offsets = NULL; + r->offset_len = 0; + if (count == 0) + return start.len - in.len; + + r->offsets = reftable_malloc(count * sizeof(uint64_t)); + r->offset_len = count; + + n = get_var_int(&r->offsets[0], &in); + if (n < 0) + return n; + string_view_consume(&in, n); + + last = r->offsets[0]; + j = 1; + while (j < count) { + uint64_t delta = 0; + int n = get_var_int(&delta, &in); + if (n < 0) { + return n; + } + string_view_consume(&in, n); + + last = r->offsets[j] = (delta + last); + j++; + } + return start.len - in.len; +} + +static int not_a_deletion(const void *p) +{ + return 0; +} + +static struct reftable_record_vtable reftable_obj_record_vtable = { + .key = &reftable_obj_record_key, + .type = BLOCK_TYPE_OBJ, + .copy_from = &reftable_obj_record_copy_from, + .val_type = &reftable_obj_record_val_type, + .encode = &reftable_obj_record_encode, + .decode = &reftable_obj_record_decode, + .release = &reftable_obj_record_release, + .is_deletion = not_a_deletion, +}; + +void reftable_log_record_print(struct reftable_log_record *log, + uint32_t hash_id) +{ + char hex[GIT_SHA256_RAWSZ + 1] = { 0 }; + + switch (log->value_type) { + case REFTABLE_LOG_DELETION: + printf("log{%s(%" PRIu64 ") delete", log->refname, + log->update_index); + break; + case REFTABLE_LOG_UPDATE: + printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", + log->refname, log->update_index, log->value.update.name, + log->value.update.email, log->value.update.time, + log->value.update.tz_offset); + hex_format(hex, log->value.update.old_hash, hash_size(hash_id)); + printf("%s => ", hex); + hex_format(hex, log->value.update.new_hash, hash_size(hash_id)); + printf("%s\n\n%s\n}\n", hex, log->value.update.message); + break; + } +} + +static void reftable_log_record_key(const void *r, struct strbuf *dest) +{ + const struct reftable_log_record *rec = + (const struct reftable_log_record *)r; + int len = strlen(rec->refname); + uint8_t i64[8]; + uint64_t ts = 0; + strbuf_reset(dest); + strbuf_add(dest, (uint8_t *)rec->refname, len + 1); + + ts = (~ts) - rec->update_index; + put_be64(&i64[0], ts); + strbuf_add(dest, i64, sizeof(i64)); +} + +static void reftable_log_record_copy_from(void *rec, const void *src_rec, + int hash_size) +{ + struct reftable_log_record *dst = rec; + const struct reftable_log_record *src = + (const struct reftable_log_record *)src_rec; + + reftable_log_record_release(dst); + *dst = *src; + if (dst->refname) { + dst->refname = xstrdup(dst->refname); + } + switch (dst->value_type) { + case REFTABLE_LOG_DELETION: + break; + case REFTABLE_LOG_UPDATE: + if (dst->value.update.email) { + dst->value.update.email = + xstrdup(dst->value.update.email); + } + if (dst->value.update.name) { + dst->value.update.name = + xstrdup(dst->value.update.name); + } + if (dst->value.update.message) { + dst->value.update.message = + xstrdup(dst->value.update.message); + } + + if (dst->value.update.new_hash) { + dst->value.update.new_hash = reftable_malloc(hash_size); + memcpy(dst->value.update.new_hash, + src->value.update.new_hash, hash_size); + } + if (dst->value.update.old_hash) { + dst->value.update.old_hash = reftable_malloc(hash_size); + memcpy(dst->value.update.old_hash, + src->value.update.old_hash, hash_size); + } + break; + } +} + +static void reftable_log_record_release_void(void *rec) +{ + struct reftable_log_record *r = rec; + reftable_log_record_release(r); +} + +void reftable_log_record_release(struct reftable_log_record *r) +{ + reftable_free(r->refname); + switch (r->value_type) { + case REFTABLE_LOG_DELETION: + break; + case REFTABLE_LOG_UPDATE: + reftable_free(r->value.update.new_hash); + reftable_free(r->value.update.old_hash); + reftable_free(r->value.update.name); + reftable_free(r->value.update.email); + reftable_free(r->value.update.message); + break; + } + memset(r, 0, sizeof(struct reftable_log_record)); +} + +static uint8_t reftable_log_record_val_type(const void *rec) +{ + const struct reftable_log_record *log = + (const struct reftable_log_record *)rec; + + return reftable_log_record_is_deletion(log) ? 0 : 1; +} + +static uint8_t zero[GIT_SHA256_RAWSZ] = { 0 }; + +static int reftable_log_record_encode(const void *rec, struct string_view s, + int hash_size) +{ + const struct reftable_log_record *r = rec; + struct string_view start = s; + int n = 0; + uint8_t *oldh = NULL; + uint8_t *newh = NULL; + if (reftable_log_record_is_deletion(r)) + return 0; + + oldh = r->value.update.old_hash; + newh = r->value.update.new_hash; + if (!oldh) { + oldh = zero; + } + if (!newh) { + newh = zero; + } + + if (s.len < 2 * hash_size) + return -1; + + memcpy(s.buf, oldh, hash_size); + memcpy(s.buf + hash_size, newh, hash_size); + string_view_consume(&s, 2 * hash_size); + + n = encode_string(r->value.update.name ? r->value.update.name : "", s); + if (n < 0) + return -1; + string_view_consume(&s, n); + + n = encode_string(r->value.update.email ? r->value.update.email : "", + s); + if (n < 0) + return -1; + string_view_consume(&s, n); + + n = put_var_int(&s, r->value.update.time); + if (n < 0) + return -1; + string_view_consume(&s, n); + + if (s.len < 2) + return -1; + + put_be16(s.buf, r->value.update.tz_offset); + string_view_consume(&s, 2); + + n = encode_string( + r->value.update.message ? r->value.update.message : "", s); + if (n < 0) + return -1; + string_view_consume(&s, n); + + return start.len - s.len; +} + +static int reftable_log_record_decode(void *rec, struct strbuf key, + uint8_t val_type, struct string_view in, + int hash_size) +{ + struct string_view start = in; + struct reftable_log_record *r = rec; + uint64_t max = 0; + uint64_t ts = 0; + struct strbuf dest = STRBUF_INIT; + int n; + + if (key.len <= 9 || key.buf[key.len - 9] != 0) + return REFTABLE_FORMAT_ERROR; + + r->refname = reftable_realloc(r->refname, key.len - 8); + memcpy(r->refname, key.buf, key.len - 8); + ts = get_be64(key.buf + key.len - 8); + + r->update_index = (~max) - ts; + + if (val_type != r->value_type) { + switch (r->value_type) { + case REFTABLE_LOG_UPDATE: + FREE_AND_NULL(r->value.update.old_hash); + FREE_AND_NULL(r->value.update.new_hash); + FREE_AND_NULL(r->value.update.message); + FREE_AND_NULL(r->value.update.email); + FREE_AND_NULL(r->value.update.name); + break; + case REFTABLE_LOG_DELETION: + break; + } + } + + r->value_type = val_type; + if (val_type == REFTABLE_LOG_DELETION) + return 0; + + if (in.len < 2 * hash_size) + return REFTABLE_FORMAT_ERROR; + + r->value.update.old_hash = + reftable_realloc(r->value.update.old_hash, hash_size); + r->value.update.new_hash = + reftable_realloc(r->value.update.new_hash, hash_size); + + memcpy(r->value.update.old_hash, in.buf, hash_size); + memcpy(r->value.update.new_hash, in.buf + hash_size, hash_size); + + string_view_consume(&in, 2 * hash_size); + + n = decode_string(&dest, in); + if (n < 0) + goto done; + string_view_consume(&in, n); + + r->value.update.name = + reftable_realloc(r->value.update.name, dest.len + 1); + memcpy(r->value.update.name, dest.buf, dest.len); + r->value.update.name[dest.len] = 0; + + strbuf_reset(&dest); + n = decode_string(&dest, in); + if (n < 0) + goto done; + string_view_consume(&in, n); + + r->value.update.email = + reftable_realloc(r->value.update.email, dest.len + 1); + memcpy(r->value.update.email, dest.buf, dest.len); + r->value.update.email[dest.len] = 0; + + ts = 0; + n = get_var_int(&ts, &in); + if (n < 0) + goto done; + string_view_consume(&in, n); + r->value.update.time = ts; + if (in.len < 2) + goto done; + + r->value.update.tz_offset = get_be16(in.buf); + string_view_consume(&in, 2); + + strbuf_reset(&dest); + n = decode_string(&dest, in); + if (n < 0) + goto done; + string_view_consume(&in, n); + + r->value.update.message = + reftable_realloc(r->value.update.message, dest.len + 1); + memcpy(r->value.update.message, dest.buf, dest.len); + r->value.update.message[dest.len] = 0; + + strbuf_release(&dest); + return start.len - in.len; + +done: + strbuf_release(&dest); + return REFTABLE_FORMAT_ERROR; +} + +static int null_streq(char *a, char *b) +{ + char *empty = ""; + if (!a) + a = empty; + + if (!b) + b = empty; + + return 0 == strcmp(a, b); +} + +static int zero_hash_eq(uint8_t *a, uint8_t *b, int sz) +{ + if (!a) + a = zero; + + if (!b) + b = zero; + + return !memcmp(a, b, sz); +} + +int reftable_log_record_equal(struct reftable_log_record *a, + struct reftable_log_record *b, int hash_size) +{ + if (!(null_streq(a->refname, b->refname) && + a->update_index == b->update_index && + a->value_type == b->value_type)) + return 0; + + switch (a->value_type) { + case REFTABLE_LOG_DELETION: + return 1; + case REFTABLE_LOG_UPDATE: + return null_streq(a->value.update.name, b->value.update.name) && + a->value.update.time == b->value.update.time && + a->value.update.tz_offset == b->value.update.tz_offset && + null_streq(a->value.update.email, + b->value.update.email) && + null_streq(a->value.update.message, + b->value.update.message) && + zero_hash_eq(a->value.update.old_hash, + b->value.update.old_hash, hash_size) && + zero_hash_eq(a->value.update.new_hash, + b->value.update.new_hash, hash_size); + } + + abort(); +} + +static int reftable_log_record_is_deletion_void(const void *p) +{ + return reftable_log_record_is_deletion( + (const struct reftable_log_record *)p); +} + +static struct reftable_record_vtable reftable_log_record_vtable = { + .key = &reftable_log_record_key, + .type = BLOCK_TYPE_LOG, + .copy_from = &reftable_log_record_copy_from, + .val_type = &reftable_log_record_val_type, + .encode = &reftable_log_record_encode, + .decode = &reftable_log_record_decode, + .release = &reftable_log_record_release_void, + .is_deletion = &reftable_log_record_is_deletion_void, +}; + +struct reftable_record reftable_new_record(uint8_t typ) +{ + struct reftable_record rec = { NULL }; + switch (typ) { + case BLOCK_TYPE_REF: { + struct reftable_ref_record *r = + reftable_calloc(sizeof(struct reftable_ref_record)); + reftable_record_from_ref(&rec, r); + return rec; + } + + case BLOCK_TYPE_OBJ: { + struct reftable_obj_record *r = + reftable_calloc(sizeof(struct reftable_obj_record)); + reftable_record_from_obj(&rec, r); + return rec; + } + case BLOCK_TYPE_LOG: { + struct reftable_log_record *r = + reftable_calloc(sizeof(struct reftable_log_record)); + reftable_record_from_log(&rec, r); + return rec; + } + case BLOCK_TYPE_INDEX: { + struct reftable_index_record empty = { .last_key = + STRBUF_INIT }; + struct reftable_index_record *r = + reftable_calloc(sizeof(struct reftable_index_record)); + *r = empty; + reftable_record_from_index(&rec, r); + return rec; + } + } + abort(); + return rec; +} + +/* clear out the record, yielding the reftable_record data that was + * encapsulated. */ +static void *reftable_record_yield(struct reftable_record *rec) +{ + void *p = rec->data; + rec->data = NULL; + return p; +} + +void reftable_record_destroy(struct reftable_record *rec) +{ + reftable_record_release(rec); + reftable_free(reftable_record_yield(rec)); +} + +static void reftable_index_record_key(const void *r, struct strbuf *dest) +{ + const struct reftable_index_record *rec = r; + strbuf_reset(dest); + strbuf_addbuf(dest, &rec->last_key); +} + +static void reftable_index_record_copy_from(void *rec, const void *src_rec, + int hash_size) +{ + struct reftable_index_record *dst = rec; + const struct reftable_index_record *src = src_rec; + + strbuf_reset(&dst->last_key); + strbuf_addbuf(&dst->last_key, &src->last_key); + dst->offset = src->offset; +} + +static void reftable_index_record_release(void *rec) +{ + struct reftable_index_record *idx = rec; + strbuf_release(&idx->last_key); +} + +static uint8_t reftable_index_record_val_type(const void *rec) +{ + return 0; +} + +static int reftable_index_record_encode(const void *rec, struct string_view out, + int hash_size) +{ + const struct reftable_index_record *r = + (const struct reftable_index_record *)rec; + struct string_view start = out; + + int n = put_var_int(&out, r->offset); + if (n < 0) + return n; + + string_view_consume(&out, n); + + return start.len - out.len; +} + +static int reftable_index_record_decode(void *rec, struct strbuf key, + uint8_t val_type, struct string_view in, + int hash_size) +{ + struct string_view start = in; + struct reftable_index_record *r = rec; + int n = 0; + + strbuf_reset(&r->last_key); + strbuf_addbuf(&r->last_key, &key); + + n = get_var_int(&r->offset, &in); + if (n < 0) + return n; + + string_view_consume(&in, n); + return start.len - in.len; +} + +static struct reftable_record_vtable reftable_index_record_vtable = { + .key = &reftable_index_record_key, + .type = BLOCK_TYPE_INDEX, + .copy_from = &reftable_index_record_copy_from, + .val_type = &reftable_index_record_val_type, + .encode = &reftable_index_record_encode, + .decode = &reftable_index_record_decode, + .release = &reftable_index_record_release, + .is_deletion = ¬_a_deletion, +}; + +void reftable_record_key(struct reftable_record *rec, struct strbuf *dest) +{ + rec->ops->key(rec->data, dest); +} + +uint8_t reftable_record_type(struct reftable_record *rec) +{ + return rec->ops->type; +} + +int reftable_record_encode(struct reftable_record *rec, struct string_view dest, + int hash_size) +{ + return rec->ops->encode(rec->data, dest, hash_size); +} + +void reftable_record_copy_from(struct reftable_record *rec, + struct reftable_record *src, int hash_size) +{ + assert(src->ops->type == rec->ops->type); + + rec->ops->copy_from(rec->data, src->data, hash_size); +} + +uint8_t reftable_record_val_type(struct reftable_record *rec) +{ + return rec->ops->val_type(rec->data); +} + +int reftable_record_decode(struct reftable_record *rec, struct strbuf key, + uint8_t extra, struct string_view src, int hash_size) +{ + return rec->ops->decode(rec->data, key, extra, src, hash_size); +} + +void reftable_record_release(struct reftable_record *rec) +{ + rec->ops->release(rec->data); +} + +int reftable_record_is_deletion(struct reftable_record *rec) +{ + return rec->ops->is_deletion(rec->data); +} + +void reftable_record_from_ref(struct reftable_record *rec, + struct reftable_ref_record *ref_rec) +{ + assert(!rec->ops); + rec->data = ref_rec; + rec->ops = &reftable_ref_record_vtable; +} + +void reftable_record_from_obj(struct reftable_record *rec, + struct reftable_obj_record *obj_rec) +{ + assert(!rec->ops); + rec->data = obj_rec; + rec->ops = &reftable_obj_record_vtable; +} + +void reftable_record_from_index(struct reftable_record *rec, + struct reftable_index_record *index_rec) +{ + assert(!rec->ops); + rec->data = index_rec; + rec->ops = &reftable_index_record_vtable; +} + +void reftable_record_from_log(struct reftable_record *rec, + struct reftable_log_record *log_rec) +{ + assert(!rec->ops); + rec->data = log_rec; + rec->ops = &reftable_log_record_vtable; +} + +struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec) +{ + assert(reftable_record_type(rec) == BLOCK_TYPE_REF); + return rec->data; +} + +struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec) +{ + assert(reftable_record_type(rec) == BLOCK_TYPE_LOG); + return rec->data; +} + +static int hash_equal(uint8_t *a, uint8_t *b, int hash_size) +{ + if (a && b) + return !memcmp(a, b, hash_size); + + return a == b; +} + +int reftable_ref_record_equal(struct reftable_ref_record *a, + struct reftable_ref_record *b, int hash_size) +{ + assert(hash_size > 0); + if (!(0 == strcmp(a->refname, b->refname) && + a->update_index == b->update_index && + a->value_type == b->value_type)) + return 0; + + switch (a->value_type) { + case REFTABLE_REF_SYMREF: + return !strcmp(a->value.symref, b->value.symref); + case REFTABLE_REF_VAL2: + return hash_equal(a->value.val2.value, b->value.val2.value, + hash_size) && + hash_equal(a->value.val2.target_value, + b->value.val2.target_value, hash_size); + case REFTABLE_REF_VAL1: + return hash_equal(a->value.val1, b->value.val1, hash_size); + case REFTABLE_REF_DELETION: + return 1; + default: + abort(); + } +} + +int reftable_ref_record_compare_name(const void *a, const void *b) +{ + return strcmp(((struct reftable_ref_record *)a)->refname, + ((struct reftable_ref_record *)b)->refname); +} + +int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref) +{ + return ref->value_type == REFTABLE_REF_DELETION; +} + +int reftable_log_record_compare_key(const void *a, const void *b) +{ + const struct reftable_log_record *la = a; + const struct reftable_log_record *lb = b; + + int cmp = strcmp(la->refname, lb->refname); + if (cmp) + return cmp; + if (la->update_index > lb->update_index) + return -1; + return (la->update_index < lb->update_index) ? 1 : 0; +} + +int reftable_log_record_is_deletion(const struct reftable_log_record *log) +{ + return (log->value_type == REFTABLE_LOG_DELETION); +} + +void string_view_consume(struct string_view *s, int n) +{ + s->buf += n; + s->len -= n; +} diff --git a/reftable/record.h b/reftable/record.h new file mode 100644 index 00000000000..498e8c50bf4 --- /dev/null +++ b/reftable/record.h @@ -0,0 +1,139 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef RECORD_H +#define RECORD_H + +#include "system.h" + +#include + +#include "reftable-record.h" + +/* + * A substring of existing string data. This structure takes no responsibility + * for the lifetime of the data it points to. + */ +struct string_view { + uint8_t *buf; + size_t len; +}; + +/* Advance `s.buf` by `n`, and decrease length. */ +void string_view_consume(struct string_view *s, int n); + +/* utilities for de/encoding varints */ + +int get_var_int(uint64_t *dest, struct string_view *in); +int put_var_int(struct string_view *dest, uint64_t val); + +/* Methods for records. */ +struct reftable_record_vtable { + /* encode the key of to a uint8_t strbuf. */ + void (*key)(const void *rec, struct strbuf *dest); + + /* The record type of ('r' for ref). */ + uint8_t type; + + void (*copy_from)(void *dest, const void *src, int hash_size); + + /* a value of [0..7], indicating record subvariants (eg. ref vs. symref + * vs ref deletion) */ + uint8_t (*val_type)(const void *rec); + + /* encodes rec into dest, returning how much space was used. */ + int (*encode)(const void *rec, struct string_view dest, int hash_size); + + /* decode data from `src` into the record. */ + int (*decode)(void *rec, struct strbuf key, uint8_t extra, + struct string_view src, int hash_size); + + /* deallocate and null the record. */ + void (*release)(void *rec); + + /* is this a tombstone? */ + int (*is_deletion)(const void *rec); +}; + +/* record is a generic wrapper for different types of records. */ +struct reftable_record { + void *data; + struct reftable_record_vtable *ops; +}; + +/* returns true for recognized block types. Block start with the block type. */ +int reftable_is_block_type(uint8_t typ); + +/* creates a malloced record of the given type. Dispose with record_destroy */ +struct reftable_record reftable_new_record(uint8_t typ); + +/* Encode `key` into `dest`. Sets `is_restart` to indicate a restart. Returns + * number of bytes written. */ +int reftable_encode_key(int *is_restart, struct string_view dest, + struct strbuf prev_key, struct strbuf key, + uint8_t extra); + +/* Decode into `key` and `extra` from `in` */ +int reftable_decode_key(struct strbuf *key, uint8_t *extra, + struct strbuf last_key, struct string_view in); + +/* reftable_index_record are used internally to speed up lookups. */ +struct reftable_index_record { + uint64_t offset; /* Offset of block */ + struct strbuf last_key; /* Last key of the block. */ +}; + +/* reftable_obj_record stores an object ID => ref mapping. */ +struct reftable_obj_record { + uint8_t *hash_prefix; /* leading bytes of the object ID */ + int hash_prefix_len; /* number of leading bytes. Constant + * across a single table. */ + uint64_t *offsets; /* a vector of file offsets. */ + int offset_len; +}; + +/* see struct record_vtable */ + +void reftable_record_key(struct reftable_record *rec, struct strbuf *dest); +uint8_t reftable_record_type(struct reftable_record *rec); +void reftable_record_copy_from(struct reftable_record *rec, + struct reftable_record *src, int hash_size); +uint8_t reftable_record_val_type(struct reftable_record *rec); +int reftable_record_encode(struct reftable_record *rec, struct string_view dest, + int hash_size); +int reftable_record_decode(struct reftable_record *rec, struct strbuf key, + uint8_t extra, struct string_view src, + int hash_size); +int reftable_record_is_deletion(struct reftable_record *rec); + +/* zeroes out the embedded record */ +void reftable_record_release(struct reftable_record *rec); + +/* clear and deallocate embedded record, and zero `rec`. */ +void reftable_record_destroy(struct reftable_record *rec); + +/* initialize generic records from concrete records. The generic record should + * be zeroed out. */ +void reftable_record_from_obj(struct reftable_record *rec, + struct reftable_obj_record *objrec); +void reftable_record_from_index(struct reftable_record *rec, + struct reftable_index_record *idxrec); +void reftable_record_from_ref(struct reftable_record *rec, + struct reftable_ref_record *refrec); +void reftable_record_from_log(struct reftable_record *rec, + struct reftable_log_record *logrec); +struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref); +struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref); + +/* for qsort. */ +int reftable_ref_record_compare_name(const void *a, const void *b); + +/* for qsort. */ +int reftable_log_record_compare_key(const void *a, const void *b); + +#endif diff --git a/reftable/record_test.c b/reftable/record_test.c new file mode 100644 index 00000000000..f4ad7cace41 --- /dev/null +++ b/reftable/record_test.c @@ -0,0 +1,412 @@ +/* + Copyright 2020 Google LLC + + Use of this source code is governed by a BSD-style + license that can be found in the LICENSE file or at + https://developers.google.com/open-source/licenses/bsd +*/ + +#include "record.h" + +#include "system.h" +#include "basics.h" +#include "constants.h" +#include "test_framework.h" +#include "reftable-tests.h" + +static void test_copy(struct reftable_record *rec) +{ + struct reftable_record copy = + reftable_new_record(reftable_record_type(rec)); + reftable_record_copy_from(©, rec, GIT_SHA1_RAWSZ); + /* do it twice to catch memory leaks */ + reftable_record_copy_from(©, rec, GIT_SHA1_RAWSZ); + switch (reftable_record_type(©)) { + case BLOCK_TYPE_REF: + EXPECT(reftable_ref_record_equal(reftable_record_as_ref(©), + reftable_record_as_ref(rec), + GIT_SHA1_RAWSZ)); + break; + case BLOCK_TYPE_LOG: + EXPECT(reftable_log_record_equal(reftable_record_as_log(©), + reftable_record_as_log(rec), + GIT_SHA1_RAWSZ)); + break; + } + reftable_record_destroy(©); +} + +static void test_varint_roundtrip(void) +{ + uint64_t inputs[] = { 0, + 1, + 27, + 127, + 128, + 257, + 4096, + ((uint64_t)1 << 63), + ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 }; + int i = 0; + for (i = 0; i < ARRAY_SIZE(inputs); i++) { + uint8_t dest[10]; + + struct string_view out = { + .buf = dest, + .len = sizeof(dest), + }; + uint64_t in = inputs[i]; + int n = put_var_int(&out, in); + uint64_t got = 0; + + EXPECT(n > 0); + out.len = n; + n = get_var_int(&got, &out); + EXPECT(n > 0); + + EXPECT(got == in); + } +} + +static void test_common_prefix(void) +{ + struct { + const char *a, *b; + int want; + } cases[] = { + { "abc", "ab", 2 }, + { "", "abc", 0 }, + { "abc", "abd", 2 }, + { "abc", "pqr", 0 }, + }; + + int i = 0; + for (i = 0; i < ARRAY_SIZE(cases); i++) { + struct strbuf a = STRBUF_INIT; + struct strbuf b = STRBUF_INIT; + strbuf_addstr(&a, cases[i].a); + strbuf_addstr(&b, cases[i].b); + EXPECT(common_prefix_size(&a, &b) == cases[i].want); + + strbuf_release(&a); + strbuf_release(&b); + } +} + +static void set_hash(uint8_t *h, int j) +{ + int i = 0; + for (i = 0; i < hash_size(GIT_SHA1_FORMAT_ID); i++) { + h[i] = (j >> i) & 0xff; + } +} + +static void test_reftable_ref_record_roundtrip(void) +{ + int i = 0; + + for (i = REFTABLE_REF_DELETION; i < REFTABLE_NR_REF_VALUETYPES; i++) { + struct reftable_ref_record in = { NULL }; + struct reftable_ref_record out = { NULL }; + struct reftable_record rec_out = { NULL }; + struct strbuf key = STRBUF_INIT; + struct reftable_record rec = { NULL }; + uint8_t buffer[1024] = { 0 }; + struct string_view dest = { + .buf = buffer, + .len = sizeof(buffer), + }; + + int n, m; + + in.value_type = i; + switch (i) { + case REFTABLE_REF_DELETION: + break; + case REFTABLE_REF_VAL1: + in.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); + set_hash(in.value.val1, 1); + break; + case REFTABLE_REF_VAL2: + in.value.val2.value = reftable_malloc(GIT_SHA1_RAWSZ); + set_hash(in.value.val2.value, 1); + in.value.val2.target_value = + reftable_malloc(GIT_SHA1_RAWSZ); + set_hash(in.value.val2.target_value, 2); + break; + case REFTABLE_REF_SYMREF: + in.value.symref = xstrdup("target"); + break; + } + in.refname = xstrdup("refs/heads/master"); + + reftable_record_from_ref(&rec, &in); + test_copy(&rec); + + EXPECT(reftable_record_val_type(&rec) == i); + + reftable_record_key(&rec, &key); + n = reftable_record_encode(&rec, dest, GIT_SHA1_RAWSZ); + EXPECT(n > 0); + + /* decode into a non-zero reftable_record to test for leaks. */ + + reftable_record_from_ref(&rec_out, &out); + m = reftable_record_decode(&rec_out, key, i, dest, + GIT_SHA1_RAWSZ); + EXPECT(n == m); + + EXPECT(reftable_ref_record_equal(&in, &out, GIT_SHA1_RAWSZ)); + reftable_record_release(&rec_out); + + strbuf_release(&key); + reftable_ref_record_release(&in); + } +} + +static void test_reftable_log_record_equal(void) +{ + struct reftable_log_record in[2] = { + { + .refname = xstrdup("refs/heads/master"), + .update_index = 42, + }, + { + .refname = xstrdup("refs/heads/master"), + .update_index = 22, + } + }; + + EXPECT(!reftable_log_record_equal(&in[0], &in[1], GIT_SHA1_RAWSZ)); + in[1].update_index = in[0].update_index; + EXPECT(reftable_log_record_equal(&in[0], &in[1], GIT_SHA1_RAWSZ)); + reftable_log_record_release(&in[0]); + reftable_log_record_release(&in[1]); +} + +static void test_reftable_log_record_roundtrip(void) +{ + int i; + struct reftable_log_record in[2] = { + { + .refname = xstrdup("refs/heads/master"), + .update_index = 42, + .value_type = REFTABLE_LOG_UPDATE, + .value = { + .update = { + .old_hash = reftable_malloc(GIT_SHA1_RAWSZ), + .new_hash = reftable_malloc(GIT_SHA1_RAWSZ), + .name = xstrdup("han-wen"), + .email = xstrdup("hanwen@google.com"), + .message = xstrdup("test"), + .time = 1577123507, + .tz_offset = 100, + }, + } + }, + { + .refname = xstrdup("refs/heads/master"), + .update_index = 22, + .value_type = REFTABLE_LOG_DELETION, + } + }; + set_test_hash(in[0].value.update.new_hash, 1); + set_test_hash(in[0].value.update.old_hash, 2); + for (i = 0; i < ARRAY_SIZE(in); i++) { + struct reftable_record rec = { NULL }; + struct strbuf key = STRBUF_INIT; + uint8_t buffer[1024] = { 0 }; + struct string_view dest = { + .buf = buffer, + .len = sizeof(buffer), + }; + /* populate out, to check for leaks. */ + struct reftable_log_record out = { + .refname = xstrdup("old name"), + .value_type = REFTABLE_LOG_UPDATE, + .value = { + .update = { + .new_hash = reftable_calloc(GIT_SHA1_RAWSZ), + .old_hash = reftable_calloc(GIT_SHA1_RAWSZ), + .name = xstrdup("old name"), + .email = xstrdup("old@email"), + .message = xstrdup("old message"), + }, + }, + }; + struct reftable_record rec_out = { NULL }; + int n, m, valtype; + + reftable_record_from_log(&rec, &in[i]); + + test_copy(&rec); + + reftable_record_key(&rec, &key); + + n = reftable_record_encode(&rec, dest, GIT_SHA1_RAWSZ); + EXPECT(n >= 0); + reftable_record_from_log(&rec_out, &out); + valtype = reftable_record_val_type(&rec); + m = reftable_record_decode(&rec_out, key, valtype, dest, + GIT_SHA1_RAWSZ); + EXPECT(n == m); + + EXPECT(reftable_log_record_equal(&in[i], &out, GIT_SHA1_RAWSZ)); + reftable_log_record_release(&in[i]); + strbuf_release(&key); + reftable_record_release(&rec_out); + } +} + +static void test_u24_roundtrip(void) +{ + uint32_t in = 0x112233; + uint8_t dest[3]; + uint32_t out; + put_be24(dest, in); + out = get_be24(dest); + EXPECT(in == out); +} + +static void test_key_roundtrip(void) +{ + uint8_t buffer[1024] = { 0 }; + struct string_view dest = { + .buf = buffer, + .len = sizeof(buffer), + }; + struct strbuf last_key = STRBUF_INIT; + struct strbuf key = STRBUF_INIT; + struct strbuf roundtrip = STRBUF_INIT; + int restart; + uint8_t extra; + int n, m; + uint8_t rt_extra; + + strbuf_addstr(&last_key, "refs/heads/master"); + strbuf_addstr(&key, "refs/tags/bla"); + extra = 6; + n = reftable_encode_key(&restart, dest, last_key, key, extra); + EXPECT(!restart); + EXPECT(n > 0); + + m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest); + EXPECT(n == m); + EXPECT(0 == strbuf_cmp(&key, &roundtrip)); + EXPECT(rt_extra == extra); + + strbuf_release(&last_key); + strbuf_release(&key); + strbuf_release(&roundtrip); +} + +static void test_reftable_obj_record_roundtrip(void) +{ + uint8_t testHash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 4, 0 }; + uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 }; + struct reftable_obj_record recs[3] = { { + .hash_prefix = testHash1, + .hash_prefix_len = 5, + .offsets = till9, + .offset_len = 3, + }, + { + .hash_prefix = testHash1, + .hash_prefix_len = 5, + .offsets = till9, + .offset_len = 9, + }, + { + .hash_prefix = testHash1, + .hash_prefix_len = 5, + } }; + int i = 0; + for (i = 0; i < ARRAY_SIZE(recs); i++) { + struct reftable_obj_record in = recs[i]; + uint8_t buffer[1024] = { 0 }; + struct string_view dest = { + .buf = buffer, + .len = sizeof(buffer), + }; + struct reftable_record rec = { NULL }; + struct strbuf key = STRBUF_INIT; + struct reftable_obj_record out = { NULL }; + struct reftable_record rec_out = { NULL }; + int n, m; + uint8_t extra; + + reftable_record_from_obj(&rec, &in); + test_copy(&rec); + reftable_record_key(&rec, &key); + n = reftable_record_encode(&rec, dest, GIT_SHA1_RAWSZ); + EXPECT(n > 0); + extra = reftable_record_val_type(&rec); + reftable_record_from_obj(&rec_out, &out); + m = reftable_record_decode(&rec_out, key, extra, dest, + GIT_SHA1_RAWSZ); + EXPECT(n == m); + + EXPECT(in.hash_prefix_len == out.hash_prefix_len); + EXPECT(in.offset_len == out.offset_len); + + EXPECT(!memcmp(in.hash_prefix, out.hash_prefix, + in.hash_prefix_len)); + EXPECT(0 == memcmp(in.offsets, out.offsets, + sizeof(uint64_t) * in.offset_len)); + strbuf_release(&key); + reftable_record_release(&rec_out); + } +} + +static void test_reftable_index_record_roundtrip(void) +{ + struct reftable_index_record in = { + .offset = 42, + .last_key = STRBUF_INIT, + }; + uint8_t buffer[1024] = { 0 }; + struct string_view dest = { + .buf = buffer, + .len = sizeof(buffer), + }; + struct strbuf key = STRBUF_INIT; + struct reftable_record rec = { NULL }; + struct reftable_index_record out = { .last_key = STRBUF_INIT }; + struct reftable_record out_rec = { NULL }; + int n, m; + uint8_t extra; + + strbuf_addstr(&in.last_key, "refs/heads/master"); + reftable_record_from_index(&rec, &in); + reftable_record_key(&rec, &key); + test_copy(&rec); + + EXPECT(0 == strbuf_cmp(&key, &in.last_key)); + n = reftable_record_encode(&rec, dest, GIT_SHA1_RAWSZ); + EXPECT(n > 0); + + extra = reftable_record_val_type(&rec); + reftable_record_from_index(&out_rec, &out); + m = reftable_record_decode(&out_rec, key, extra, dest, GIT_SHA1_RAWSZ); + EXPECT(m == n); + + EXPECT(in.offset == out.offset); + + reftable_record_release(&out_rec); + strbuf_release(&key); + strbuf_release(&in.last_key); +} + +int record_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_reftable_log_record_equal); + RUN_TEST(test_reftable_log_record_roundtrip); + RUN_TEST(test_reftable_ref_record_roundtrip); + RUN_TEST(test_varint_roundtrip); + RUN_TEST(test_key_roundtrip); + RUN_TEST(test_common_prefix); + RUN_TEST(test_reftable_obj_record_roundtrip); + RUN_TEST(test_reftable_index_record_roundtrip); + RUN_TEST(test_u24_roundtrip); + return 0; +} diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h new file mode 100644 index 00000000000..5370d2288c7 --- /dev/null +++ b/reftable/reftable-record.h @@ -0,0 +1,114 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_RECORD_H +#define REFTABLE_RECORD_H + +#include + +/* + * Basic data types + * + * Reftables store the state of each ref in struct reftable_ref_record, and they + * store a sequence of reflog updates in struct reftable_log_record. + */ + +/* reftable_ref_record holds a ref database entry target_value */ +struct reftable_ref_record { + char *refname; /* Name of the ref, malloced. */ + uint64_t update_index; /* Logical timestamp at which this value is + * written */ + + enum { + /* tombstone to hide deletions from earlier tables */ + REFTABLE_REF_DELETION = 0x0, + + /* a simple ref */ + REFTABLE_REF_VAL1 = 0x1, + /* a tag, plus its peeled hash */ + REFTABLE_REF_VAL2 = 0x2, + + /* a symbolic reference */ + REFTABLE_REF_SYMREF = 0x3, +#define REFTABLE_NR_REF_VALUETYPES 4 + } value_type; + union { + uint8_t *val1; /* malloced hash. */ + struct { + uint8_t *value; /* first value, malloced hash */ + uint8_t *target_value; /* second value, malloced hash */ + } val2; + char *symref; /* referent, malloced 0-terminated string */ + } value; +}; + +/* Returns the first hash, or NULL if `rec` is not of type + * REFTABLE_REF_VAL1 or REFTABLE_REF_VAL2. */ +uint8_t *reftable_ref_record_val1(struct reftable_ref_record *rec); + +/* Returns the second hash, or NULL if `rec` is not of type + * REFTABLE_REF_VAL2. */ +uint8_t *reftable_ref_record_val2(struct reftable_ref_record *rec); + +/* returns whether 'ref' represents a deletion */ +int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref); + +/* prints a reftable_ref_record onto stdout. Useful for debugging. */ +void reftable_ref_record_print(struct reftable_ref_record *ref, + uint32_t hash_id); + +/* frees and nulls all pointer values inside `ref`. */ +void reftable_ref_record_release(struct reftable_ref_record *ref); + +/* returns whether two reftable_ref_records are the same. Useful for testing. */ +int reftable_ref_record_equal(struct reftable_ref_record *a, + struct reftable_ref_record *b, int hash_size); + +/* reftable_log_record holds a reflog entry */ +struct reftable_log_record { + char *refname; + uint64_t update_index; /* logical timestamp of a transactional update. + */ + + enum { + /* tombstone to hide deletions from earlier tables */ + REFTABLE_LOG_DELETION = 0x0, + + /* a simple update */ + REFTABLE_LOG_UPDATE = 0x1, +#define REFTABLE_NR_LOG_VALUETYPES 2 + } value_type; + + union { + struct { + uint8_t *new_hash; + uint8_t *old_hash; + char *name; + char *email; + uint64_t time; + int16_t tz_offset; + char *message; + } update; + } value; +}; + +/* returns whether 'ref' represents the deletion of a log record. */ +int reftable_log_record_is_deletion(const struct reftable_log_record *log); + +/* frees and nulls all pointer values. */ +void reftable_log_record_release(struct reftable_log_record *log); + +/* returns whether two records are equal. Useful for testing. */ +int reftable_log_record_equal(struct reftable_log_record *a, + struct reftable_log_record *b, int hash_size); + +/* dumps a reftable_log_record on stdout, for debugging/testing. */ +void reftable_log_record_print(struct reftable_log_record *log, + uint32_t hash_id); + +#endif diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 3b58e423e7b..09d4b83ef9b 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -4,6 +4,6 @@ int cmd__reftable(int argc, const char **argv) { basics_test_main(argc, argv); - + record_test_main(argc, argv); return 0; } From patchwork Thu Sep 9 18:47:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7C48C433F5 for ; Thu, 9 Sep 2021 18:48:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9243E61108 for ; Thu, 9 Sep 2021 18:48:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245136AbhIIStm (ORCPT ); Thu, 9 Sep 2021 14:49:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241940AbhIIStB (ORCPT ); Thu, 9 Sep 2021 14:49:01 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E0FDC061575 for ; Thu, 9 Sep 2021 11:47:52 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id l18-20020a05600c4f1200b002f8cf606262so2260365wmq.1 for ; Thu, 09 Sep 2021 11:47:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=0Ua1fCeepG0NUqNz6Kpuy2K+kf8j7GEZIvXxM3FPw28=; b=YbKNB9bpkuTiZCxeCswYCVMnxDOF/4smfwefRe9hUYAcUcLimM1iajdKMrpl/e5DRo 72IlKNWQNP7NhlJ91aBTiX2MkoHUQyYtzgS9B0NpRIsR+TKtoWjk2sIJHAN6YP4SxMEt mvQXiYZFW4vGary8WYbW7fu5qHHvlswmbt4yrvcb8ZT17sonoT6I+HJ12Wv8BktuKyLG nRHChbR/JI527ul+iruINfIakbv6NuI6wIo2dGkQ03Cd822rFdRPfgVF2i034UNXsp7I kxR2YwEyJ/bZReX3JnU7v0QDfcf2oH87m/s5aKjdx7fhkuj953HkYWNivi7C4MYQMYl7 t0pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=0Ua1fCeepG0NUqNz6Kpuy2K+kf8j7GEZIvXxM3FPw28=; b=6YNIYgXg9AyptebuUeVDLzOvksbhf/qdtUScsB9TJ73Icf/SCeDKl/I/lf+ZmQmHlh rZ0YHhgCj1nfxu66l/5Nyp3UfQos41/J4w8RJ1aVgzoNcZhS7rbyynO0teKGyKTWok6w 3ONlzQYXEE0LsOuJwmq8aYv7aPWMvwshVg4K/MHaQLl+XmbtrmB7prSWCvKn02PXOSUF cbYGqcj2QiS3xJuNmWocKl+PNmJSlXTYyK9STjyajgdkuPezlcFk0JuLKrso2rx9S76d 2GLkZXdHYy2uXT+PUP9Cbog6flrnU9BcLi958HpToBuocNCDwfa1zRUAJ9HPwPePrOSW C8yw== X-Gm-Message-State: AOAM5301xqc0nB0ewKP66w8fuJObYHSh54YrkwEREAMB3d1bPkybNkE3 T2I9GNBF7D9Bj2zLtlU7bjYdj1lbahs= X-Google-Smtp-Source: ABdhPJxrs0wIfzS7Q/Qq+ePXqZWw9Ld9dkhSJnAROM6vSP7EXEpuU/J8uYB7ZLOu1S3tFyXTu21MdA== X-Received: by 2002:a1c:28b:: with SMTP id 133mr4489204wmc.138.1631213270651; Thu, 09 Sep 2021 11:47:50 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x9sm2189925wmi.30.2021.09.09.11.47.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:50 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:32 +0000 Subject: [PATCH v2 07/19] Provide zlib's uncompress2 from compat/zlib-compat.c MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This will be needed for reading reflog blocks in reftable. Helped-by: Carlo Marcelo Arenas Belón Signed-off-by: Han-Wen Nienhuys --- Makefile | 7 +++ ci/lib.sh | 1 + compat/.gitattributes | 1 + compat/zlib-uncompress2.c | 95 +++++++++++++++++++++++++++++++++++++++ config.mak.uname | 5 +++ configure.ac | 13 ++++++ 6 files changed, 122 insertions(+) create mode 100644 compat/.gitattributes create mode 100644 compat/zlib-uncompress2.c diff --git a/Makefile b/Makefile index 14d025ea424..1c2d0e6be7c 100644 --- a/Makefile +++ b/Makefile @@ -256,6 +256,8 @@ all:: # # Define NO_DEFLATE_BOUND if your zlib does not have deflateBound. # +# Define NO_UNCOMPRESS2 if your zlib does not have uncompress2. +# # Define NO_NORETURN if using buggy versions of gcc 4.6+ and profile feedback, # as the compiler can crash (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49299) # @@ -1738,6 +1740,11 @@ ifdef NO_DEFLATE_BOUND BASIC_CFLAGS += -DNO_DEFLATE_BOUND endif +ifdef NO_UNCOMPRESS2 + BASIC_CFLAGS += -DNO_UNCOMPRESS2 + REFTABLE_OBJS += compat/zlib-uncompress2.o +endif + ifdef NO_POSIX_GOODIES BASIC_CFLAGS += -DNO_POSIX_GOODIES endif diff --git a/ci/lib.sh b/ci/lib.sh index 476c3f369f5..5711c63979d 100755 --- a/ci/lib.sh +++ b/ci/lib.sh @@ -224,6 +224,7 @@ linux-gcc-default) ;; Linux32) CC=gcc + MAKEFLAGS="$MAKEFLAGS NO_UNCOMPRESS2=1" ;; linux-musl) CC=gcc diff --git a/compat/.gitattributes b/compat/.gitattributes new file mode 100644 index 00000000000..40dbfb170da --- /dev/null +++ b/compat/.gitattributes @@ -0,0 +1 @@ +/zlib-uncompress2.c whitespace=-indent-with-non-tab,-trailing-space diff --git a/compat/zlib-uncompress2.c b/compat/zlib-uncompress2.c new file mode 100644 index 00000000000..3dcb90b2839 --- /dev/null +++ b/compat/zlib-uncompress2.c @@ -0,0 +1,95 @@ +/* taken from zlib's uncompr.c + + commit cacf7f1d4e3d44d871b605da3b647f07d718623f + Author: Mark Adler + Date: Sun Jan 15 09:18:46 2017 -0800 + + zlib 1.2.11 + +*/ + +#include "../reftable/system.h" +#define z_const + +/* + * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler + * For conditions of distribution and use, see copyright notice in zlib.h + */ + +#include + +/* clang-format off */ + +/* =========================================================================== + Decompresses the source buffer into the destination buffer. *sourceLen is + the byte length of the source buffer. Upon entry, *destLen is the total size + of the destination buffer, which must be large enough to hold the entire + uncompressed data. (The size of the uncompressed data must have been saved + previously by the compressor and transmitted to the decompressor by some + mechanism outside the scope of this compression library.) Upon exit, + *destLen is the size of the decompressed data and *sourceLen is the number + of source bytes consumed. Upon return, source + *sourceLen points to the + first unused input byte. + + uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough + memory, Z_BUF_ERROR if there was not enough room in the output buffer, or + Z_DATA_ERROR if the input data was corrupted, including if the input data is + an incomplete zlib stream. +*/ +int ZEXPORT uncompress2 ( + Bytef *dest, + uLongf *destLen, + const Bytef *source, + uLong *sourceLen) { + z_stream stream; + int err; + const uInt max = (uInt)-1; + uLong len, left; + Byte buf[1]; /* for detection of incomplete stream when *destLen == 0 */ + + len = *sourceLen; + if (*destLen) { + left = *destLen; + *destLen = 0; + } + else { + left = 1; + dest = buf; + } + + stream.next_in = (z_const Bytef *)source; + stream.avail_in = 0; + stream.zalloc = (alloc_func)0; + stream.zfree = (free_func)0; + stream.opaque = (voidpf)0; + + err = inflateInit(&stream); + if (err != Z_OK) return err; + + stream.next_out = dest; + stream.avail_out = 0; + + do { + if (stream.avail_out == 0) { + stream.avail_out = left > (uLong)max ? max : (uInt)left; + left -= stream.avail_out; + } + if (stream.avail_in == 0) { + stream.avail_in = len > (uLong)max ? max : (uInt)len; + len -= stream.avail_in; + } + err = inflate(&stream, Z_NO_FLUSH); + } while (err == Z_OK); + + *sourceLen -= len + stream.avail_in; + if (dest != buf) + *destLen = stream.total_out; + else if (stream.total_out && err == Z_BUF_ERROR) + left = 1; + + inflateEnd(&stream); + return err == Z_STREAM_END ? Z_OK : + err == Z_NEED_DICT ? Z_DATA_ERROR : + err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR : + err; +} diff --git a/config.mak.uname b/config.mak.uname index 76516aaa9a5..9beacda4434 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -258,6 +258,11 @@ ifeq ($(uname_S),FreeBSD) FILENO_IS_A_MACRO = UnfortunatelyYes endif ifeq ($(uname_S),OpenBSD) + NO_UNCOMPRESS2 = YesPlease + # Versions < 7.0 need compatibility layer + ifeq ($(shell expr "$(uname_R)" : "[1-6]\."),2) + NO_UNCOMPRESS2 = UnfortunatelyYes + endif NO_STRCASESTR = YesPlease NO_MEMMEM = YesPlease USE_ST_TIMESPEC = YesPlease diff --git a/configure.ac b/configure.ac index 031e8d3fee8..c3a913103d0 100644 --- a/configure.ac +++ b/configure.ac @@ -672,9 +672,22 @@ AC_LINK_IFELSE([ZLIBTEST_SRC], NO_DEFLATE_BOUND=yes]) LIBS="$old_LIBS" +AC_DEFUN([ZLIBTEST_UNCOMPRESS2_SRC], [ +AC_LANG_PROGRAM([#include ], + [uncompress2(NULL,NULL,NULL,NULL);])]) +AC_MSG_CHECKING([for uncompress2 in -lz]) +old_LIBS="$LIBS" +LIBS="$LIBS -lz" +AC_LINK_IFELSE([ZLIBTEST_UNCOMPRESS2_SRC], + [AC_MSG_RESULT([yes])], + [AC_MSG_RESULT([no]) + NO_UNCOMPRESS2=yes]) +LIBS="$old_LIBS" + GIT_UNSTASH_FLAGS($ZLIB_PATH) GIT_CONF_SUBST([NO_DEFLATE_BOUND]) +GIT_CONF_SUBST([NO_UNCOMPRESS2]) # # Define NEEDS_SOCKET if linking with libc is not enough (SunOS, From patchwork Thu Sep 9 18:47:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77A86C433EF for ; Thu, 9 Sep 2021 18:48:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57F8E61108 for ; Thu, 9 Sep 2021 18:48:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245690AbhIIStt (ORCPT ); Thu, 9 Sep 2021 14:49:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244445AbhIIStC (ORCPT ); Thu, 9 Sep 2021 14:49:02 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C562EC061768 for ; Thu, 9 Sep 2021 11:47:52 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id v10so3963885wrd.4 for ; Thu, 09 Sep 2021 11:47:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=3AIqWeKJLN9WiKC4P9nyqQkIppFSBdh/700q0P1RL98=; b=WqCH2buCEAK0g1xMaFrhRiuZheDDs4I8doiMUt1VzjX4t7JkEjewHVai9m9lsUDB17 E+6I4D9yvyiJcldBfGBkVxSrEtgFKhG4LTEa6zzRaw6koNzkL16kqyjxBWhRhmrEqqLf pTzo8kyB/hOsaim4Lo31w/3AQkXRKA7VrgZsNXHVQZCNEdwkGIOLgUiL2x6AbgII3lkI +lOIP8pRaPCWTGlMcRggl1kwncDbVDbaSDHpSJUI9UHKkppQiadPC5L9QDh6ZvOkOtKy AxwMhGNsD2Y3YGbnkrswkecJrMZ0r8Lm0Yn7P82POYRQIYzsOYZTKW6XTuOHaI3TzzH2 cjLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=3AIqWeKJLN9WiKC4P9nyqQkIppFSBdh/700q0P1RL98=; b=nyF2j/ZYKCFZkKfD4FF4P1zR6Wb0uDyaYxGootrH5lfOucqx8qRFQVr6iv59gKv3jq HDFtt6p4r7Fc+XNZbHFYfVbX8ZrcmjlqJcoGfeC6f3UqY5ZzN3HTI8yvwmcpuM+iXRZY 7wt5zOsVDMW8kll9Bw5FNMinaI5UuD/GZ+ef4SM66Eh/MO5djHTLsTUgqwtyAhL3GDfk snrD2+RYEFGZA3/38asiM2tye/RA40FSb3VX+mgZysk7TaMmsPXBGIwN8URcnfHYsKfW hN5ED/tE6StoBjKuvFMx18PIYENFigVGRuqEGJewwDL6iafReIFrsTTVWiF0yO7kjfog vJ5g== X-Gm-Message-State: AOAM530QXfpvKshazXyNnui1LLT99bsOZyacWqtITYuA4wuusvRg6eQ1 58gQuar6S7i7oijIwW6GpSbcaEeA7Ss= X-Google-Smtp-Source: ABdhPJztUW9awapS0y0D3DBdfyCbbV25ZWhQGBR0IikEVJ7aEt4mbQHU90b4F8Y1E5E3/8YHz9+7rA== X-Received: by 2002:a05:6000:11d1:: with SMTP id i17mr5481443wrx.424.1631213271225; Thu, 09 Sep 2021 11:47:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o5sm2425429wrw.17.2021.09.09.11.47.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:50 -0700 (PDT) Message-Id: <97f7ee04886d573f6b0f2e0d54a853cf016e7d74.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:33 +0000 Subject: [PATCH v2 08/19] reftable: reading/writing blocks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Signed-off-by: Han-Wen Nienhuys --- Makefile | 2 + reftable/block.c | 439 +++++++++++++++++++++++++++++++++++++++ reftable/block.h | 127 +++++++++++ reftable/block_test.c | 120 +++++++++++ t/helper/test-reftable.c | 1 + 5 files changed, 689 insertions(+) create mode 100644 reftable/block.c create mode 100644 reftable/block.h create mode 100644 reftable/block_test.c diff --git a/Makefile b/Makefile index 1c2d0e6be7c..14a0a1f7643 100644 --- a/Makefile +++ b/Makefile @@ -2458,10 +2458,12 @@ xdiff-objs: $(XDIFF_OBJS) REFTABLE_OBJS += reftable/basics.o REFTABLE_OBJS += reftable/error.o +REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o REFTABLE_OBJS += reftable/record.o +REFTABLE_TEST_OBJS += reftable/block_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o REFTABLE_TEST_OBJS += reftable/basics_test.o diff --git a/reftable/block.c b/reftable/block.c new file mode 100644 index 00000000000..11387b260aa --- /dev/null +++ b/reftable/block.c @@ -0,0 +1,439 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "block.h" + +#include "blocksource.h" +#include "constants.h" +#include "record.h" +#include "reftable-error.h" +#include "system.h" +#include + +int header_size(int version) +{ + switch (version) { + case 1: + return 24; + case 2: + return 28; + } + abort(); +} + +int footer_size(int version) +{ + switch (version) { + case 1: + return 68; + case 2: + return 72; + } + abort(); +} + +static int block_writer_register_restart(struct block_writer *w, int n, + int is_restart, struct strbuf *key) +{ + int rlen = w->restart_len; + if (rlen >= MAX_RESTARTS) { + is_restart = 0; + } + + if (is_restart) { + rlen++; + } + if (2 + 3 * rlen + n > w->block_size - w->next) + return -1; + if (is_restart) { + if (w->restart_len == w->restart_cap) { + w->restart_cap = w->restart_cap * 2 + 1; + w->restarts = reftable_realloc( + w->restarts, sizeof(uint32_t) * w->restart_cap); + } + + w->restarts[w->restart_len++] = w->next; + } + + w->next += n; + + strbuf_reset(&w->last_key); + strbuf_addbuf(&w->last_key, key); + w->entries++; + return 0; +} + +void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf, + uint32_t block_size, uint32_t header_off, int hash_size) +{ + bw->buf = buf; + bw->hash_size = hash_size; + bw->block_size = block_size; + bw->header_off = header_off; + bw->buf[header_off] = typ; + bw->next = header_off + 4; + bw->restart_interval = 16; + bw->entries = 0; + bw->restart_len = 0; + bw->last_key.len = 0; +} + +uint8_t block_writer_type(struct block_writer *bw) +{ + return bw->buf[bw->header_off]; +} + +/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on + success */ +int block_writer_add(struct block_writer *w, struct reftable_record *rec) +{ + struct strbuf empty = STRBUF_INIT; + struct strbuf last = + w->entries % w->restart_interval == 0 ? empty : w->last_key; + struct string_view out = { + .buf = w->buf + w->next, + .len = w->block_size - w->next, + }; + + struct string_view start = out; + + int is_restart = 0; + struct strbuf key = STRBUF_INIT; + int n = 0; + + reftable_record_key(rec, &key); + n = reftable_encode_key(&is_restart, out, last, key, + reftable_record_val_type(rec)); + if (n < 0) + goto done; + string_view_consume(&out, n); + + n = reftable_record_encode(rec, out, w->hash_size); + if (n < 0) + goto done; + string_view_consume(&out, n); + + if (block_writer_register_restart(w, start.len - out.len, is_restart, + &key) < 0) + goto done; + + strbuf_release(&key); + return 0; + +done: + strbuf_release(&key); + return -1; +} + +int block_writer_finish(struct block_writer *w) +{ + int i = 0; + for (i = 0; i < w->restart_len; i++) { + put_be24(w->buf + w->next, w->restarts[i]); + w->next += 3; + } + + put_be16(w->buf + w->next, w->restart_len); + w->next += 2; + put_be24(w->buf + 1 + w->header_off, w->next); + + if (block_writer_type(w) == BLOCK_TYPE_LOG) { + int block_header_skip = 4 + w->header_off; + uint8_t *compressed = NULL; + int zresult = 0; + uLongf src_len = w->next - block_header_skip; + size_t dest_cap = src_len; + + compressed = reftable_malloc(dest_cap); + while (1) { + uLongf out_dest_len = dest_cap; + + zresult = compress2(compressed, &out_dest_len, + w->buf + block_header_skip, src_len, + 9); + if (zresult == Z_BUF_ERROR) { + dest_cap *= 2; + compressed = + reftable_realloc(compressed, dest_cap); + continue; + } + + if (Z_OK != zresult) { + reftable_free(compressed); + return REFTABLE_ZLIB_ERROR; + } + + memcpy(w->buf + block_header_skip, compressed, + out_dest_len); + w->next = out_dest_len + block_header_skip; + reftable_free(compressed); + break; + } + } + return w->next; +} + +uint8_t block_reader_type(struct block_reader *r) +{ + return r->block.data[r->header_off]; +} + +int block_reader_init(struct block_reader *br, struct reftable_block *block, + uint32_t header_off, uint32_t table_block_size, + int hash_size) +{ + uint32_t full_block_size = table_block_size; + uint8_t typ = block->data[header_off]; + uint32_t sz = get_be24(block->data + header_off + 1); + + uint16_t restart_count = 0; + uint32_t restart_start = 0; + uint8_t *restart_bytes = NULL; + + if (!reftable_is_block_type(typ)) + return REFTABLE_FORMAT_ERROR; + + if (typ == BLOCK_TYPE_LOG) { + int block_header_skip = 4 + header_off; + uLongf dst_len = sz - block_header_skip; /* total size of dest + buffer. */ + uLongf src_len = block->len - block_header_skip; + /* Log blocks specify the *uncompressed* size in their header. + */ + uint8_t *uncompressed = reftable_malloc(sz); + + /* Copy over the block header verbatim. It's not compressed. */ + memcpy(uncompressed, block->data, block_header_skip); + + /* Uncompress */ + if (Z_OK != + uncompress2(uncompressed + block_header_skip, &dst_len, + block->data + block_header_skip, &src_len)) { + reftable_free(uncompressed); + return REFTABLE_ZLIB_ERROR; + } + + if (dst_len + block_header_skip != sz) + return REFTABLE_FORMAT_ERROR; + + /* We're done with the input data. */ + reftable_block_done(block); + block->data = uncompressed; + block->len = sz; + block->source = malloc_block_source(); + full_block_size = src_len + block_header_skip; + } else if (full_block_size == 0) { + full_block_size = sz; + } else if (sz < full_block_size && sz < block->len && + block->data[sz] != 0) { + /* If the block is smaller than the full block size, it is + padded (data followed by '\0') or the next block is + unaligned. */ + full_block_size = sz; + } + + restart_count = get_be16(block->data + sz - 2); + restart_start = sz - 2 - 3 * restart_count; + restart_bytes = block->data + restart_start; + + /* transfer ownership. */ + br->block = *block; + block->data = NULL; + block->len = 0; + + br->hash_size = hash_size; + br->block_len = restart_start; + br->full_block_size = full_block_size; + br->header_off = header_off; + br->restart_count = restart_count; + br->restart_bytes = restart_bytes; + + return 0; +} + +static uint32_t block_reader_restart_offset(struct block_reader *br, int i) +{ + return get_be24(br->restart_bytes + 3 * i); +} + +void block_reader_start(struct block_reader *br, struct block_iter *it) +{ + it->br = br; + strbuf_reset(&it->last_key); + it->next_off = br->header_off + 4; +} + +struct restart_find_args { + int error; + struct strbuf key; + struct block_reader *r; +}; + +static int restart_key_less(size_t idx, void *args) +{ + struct restart_find_args *a = args; + uint32_t off = block_reader_restart_offset(a->r, idx); + struct string_view in = { + .buf = a->r->block.data + off, + .len = a->r->block_len - off, + }; + + /* the restart key is verbatim in the block, so this could avoid the + alloc for decoding the key */ + struct strbuf rkey = STRBUF_INIT; + struct strbuf last_key = STRBUF_INIT; + uint8_t unused_extra; + int n = reftable_decode_key(&rkey, &unused_extra, last_key, in); + int result; + if (n < 0) { + a->error = 1; + return -1; + } + + result = strbuf_cmp(&a->key, &rkey); + strbuf_release(&rkey); + return result; +} + +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src) +{ + dest->br = src->br; + dest->next_off = src->next_off; + strbuf_reset(&dest->last_key); + strbuf_addbuf(&dest->last_key, &src->last_key); +} + +int block_iter_next(struct block_iter *it, struct reftable_record *rec) +{ + struct string_view in = { + .buf = it->br->block.data + it->next_off, + .len = it->br->block_len - it->next_off, + }; + struct string_view start = in; + struct strbuf key = STRBUF_INIT; + uint8_t extra = 0; + int n = 0; + + if (it->next_off >= it->br->block_len) + return 1; + + n = reftable_decode_key(&key, &extra, it->last_key, in); + if (n < 0) + return -1; + + string_view_consume(&in, n); + n = reftable_record_decode(rec, key, extra, in, it->br->hash_size); + if (n < 0) + return -1; + string_view_consume(&in, n); + + strbuf_reset(&it->last_key); + strbuf_addbuf(&it->last_key, &key); + it->next_off += start.len - in.len; + strbuf_release(&key); + return 0; +} + +int block_reader_first_key(struct block_reader *br, struct strbuf *key) +{ + struct strbuf empty = STRBUF_INIT; + int off = br->header_off + 4; + struct string_view in = { + .buf = br->block.data + off, + .len = br->block_len - off, + }; + + uint8_t extra = 0; + int n = reftable_decode_key(key, &extra, empty, in); + if (n < 0) + return n; + + return 0; +} + +int block_iter_seek(struct block_iter *it, struct strbuf *want) +{ + return block_reader_seek(it->br, it, want); +} + +void block_iter_close(struct block_iter *it) +{ + strbuf_release(&it->last_key); +} + +int block_reader_seek(struct block_reader *br, struct block_iter *it, + struct strbuf *want) +{ + struct restart_find_args args = { + .key = *want, + .r = br, + }; + struct reftable_record rec = reftable_new_record(block_reader_type(br)); + struct strbuf key = STRBUF_INIT; + int err = 0; + struct block_iter next = { + .last_key = STRBUF_INIT, + }; + + int i = binsearch(br->restart_count, &restart_key_less, &args); + if (args.error) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + + it->br = br; + if (i > 0) { + i--; + it->next_off = block_reader_restart_offset(br, i); + } else { + it->next_off = br->header_off + 4; + } + + /* We're looking for the last entry less/equal than the wanted key, so + we have to go one entry too far and then back up. + */ + while (1) { + block_iter_copy_from(&next, it); + err = block_iter_next(&next, &rec); + if (err < 0) + goto done; + + reftable_record_key(&rec, &key); + if (err > 0 || strbuf_cmp(&key, want) >= 0) { + err = 0; + goto done; + } + + block_iter_copy_from(it, &next); + } + +done: + strbuf_release(&key); + strbuf_release(&next.last_key); + reftable_record_destroy(&rec); + + return err; +} + +void block_writer_release(struct block_writer *bw) +{ + FREE_AND_NULL(bw->restarts); + strbuf_release(&bw->last_key); + /* the block is not owned. */ +} + +void reftable_block_done(struct reftable_block *blockp) +{ + struct reftable_block_source source = blockp->source; + if (blockp && source.ops) + source.ops->return_block(source.arg, blockp); + blockp->data = NULL; + blockp->len = 0; + blockp->source.ops = NULL; + blockp->source.arg = NULL; +} diff --git a/reftable/block.h b/reftable/block.h new file mode 100644 index 00000000000..e207706a644 --- /dev/null +++ b/reftable/block.h @@ -0,0 +1,127 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef BLOCK_H +#define BLOCK_H + +#include "basics.h" +#include "record.h" +#include "reftable-blocksource.h" + +/* + * Writes reftable blocks. The block_writer is reused across blocks to minimize + * allocation overhead. + */ +struct block_writer { + uint8_t *buf; + uint32_t block_size; + + /* Offset ofof the global header. Nonzero in the first block only. */ + uint32_t header_off; + + /* How often to restart keys. */ + int restart_interval; + int hash_size; + + /* Offset of next uint8_t to write. */ + uint32_t next; + uint32_t *restarts; + uint32_t restart_len; + uint32_t restart_cap; + + struct strbuf last_key; + int entries; +}; + +/* + * initializes the blockwriter to write `typ` entries, using `buf` as temporary + * storage. `buf` is not owned by the block_writer. */ +void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf, + uint32_t block_size, uint32_t header_off, int hash_size); + +/* returns the block type (eg. 'r' for ref records. */ +uint8_t block_writer_type(struct block_writer *bw); + +/* appends the record, or -1 if it doesn't fit. */ +int block_writer_add(struct block_writer *w, struct reftable_record *rec); + +/* appends the key restarts, and compress the block if necessary. */ +int block_writer_finish(struct block_writer *w); + +/* clears out internally allocated block_writer members. */ +void block_writer_release(struct block_writer *bw); + +/* Read a block. */ +struct block_reader { + /* offset of the block header; nonzero for the first block in a + * reftable. */ + uint32_t header_off; + + /* the memory block */ + struct reftable_block block; + int hash_size; + + /* size of the data, excluding restart data. */ + uint32_t block_len; + uint8_t *restart_bytes; + uint16_t restart_count; + + /* size of the data in the file. For log blocks, this is the compressed + * size. */ + uint32_t full_block_size; +}; + +/* Iterate over entries in a block */ +struct block_iter { + /* offset within the block of the next entry to read. */ + uint32_t next_off; + struct block_reader *br; + + /* key for last entry we read. */ + struct strbuf last_key; +}; + +/* initializes a block reader. */ +int block_reader_init(struct block_reader *br, struct reftable_block *bl, + uint32_t header_off, uint32_t table_block_size, + int hash_size); + +/* Position `it` at start of the block */ +void block_reader_start(struct block_reader *br, struct block_iter *it); + +/* Position `it` to the `want` key in the block */ +int block_reader_seek(struct block_reader *br, struct block_iter *it, + struct strbuf *want); + +/* Returns the block type (eg. 'r' for refs) */ +uint8_t block_reader_type(struct block_reader *r); + +/* Decodes the first key in the block */ +int block_reader_first_key(struct block_reader *br, struct strbuf *key); + +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src); + +/* return < 0 for error, 0 for OK, > 0 for EOF. */ +int block_iter_next(struct block_iter *it, struct reftable_record *rec); + +/* Seek to `want` with in the block pointed to by `it` */ +int block_iter_seek(struct block_iter *it, struct strbuf *want); + +/* deallocate memory for `it`. The block reader and its block is left intact. */ +void block_iter_close(struct block_iter *it); + +/* size of file header, depending on format version */ +int header_size(int version); + +/* size of file footer, depending on format version */ +int footer_size(int version); + +/* returns a block to its source. */ +void reftable_block_done(struct reftable_block *ret); + +#endif diff --git a/reftable/block_test.c b/reftable/block_test.c new file mode 100644 index 00000000000..4b3ea262dcb --- /dev/null +++ b/reftable/block_test.c @@ -0,0 +1,120 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "block.h" + +#include "system.h" +#include "blocksource.h" +#include "basics.h" +#include "constants.h" +#include "record.h" +#include "test_framework.h" +#include "reftable-tests.h" + +static void test_block_read_write(void) +{ + const int header_off = 21; /* random */ + char *names[30]; + const int N = ARRAY_SIZE(names); + const int block_size = 1024; + struct reftable_block block = { NULL }; + struct block_writer bw = { + .last_key = STRBUF_INIT, + }; + struct reftable_ref_record ref = { NULL }; + struct reftable_record rec = { NULL }; + int i = 0; + int n; + struct block_reader br = { 0 }; + struct block_iter it = { .last_key = STRBUF_INIT }; + int j = 0; + struct strbuf want = STRBUF_INIT; + + block.data = reftable_calloc(block_size); + block.len = block_size; + block.source = malloc_block_source(); + block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size, + header_off, hash_size(GIT_SHA1_FORMAT_ID)); + reftable_record_from_ref(&rec, &ref); + + for (i = 0; i < N; i++) { + char name[100]; + uint8_t hash[GIT_SHA1_RAWSZ]; + snprintf(name, sizeof(name), "branch%02d", i); + memset(hash, i, sizeof(hash)); + + ref.refname = name; + ref.value_type = REFTABLE_REF_VAL1; + ref.value.val1 = hash; + + names[i] = xstrdup(name); + n = block_writer_add(&bw, &rec); + ref.refname = NULL; + ref.value_type = REFTABLE_REF_DELETION; + EXPECT(n == 0); + } + + n = block_writer_finish(&bw); + EXPECT(n > 0); + + block_writer_release(&bw); + + block_reader_init(&br, &block, header_off, block_size, GIT_SHA1_RAWSZ); + + block_reader_start(&br, &it); + + while (1) { + int r = block_iter_next(&it, &rec); + EXPECT(r >= 0); + if (r > 0) { + break; + } + EXPECT_STREQ(names[j], ref.refname); + j++; + } + + reftable_record_release(&rec); + block_iter_close(&it); + + for (i = 0; i < N; i++) { + struct block_iter it = { .last_key = STRBUF_INIT }; + strbuf_reset(&want); + strbuf_addstr(&want, names[i]); + + n = block_reader_seek(&br, &it, &want); + EXPECT(n == 0); + + n = block_iter_next(&it, &rec); + EXPECT(n == 0); + + EXPECT_STREQ(names[i], ref.refname); + + want.len--; + n = block_reader_seek(&br, &it, &want); + EXPECT(n == 0); + + n = block_iter_next(&it, &rec); + EXPECT(n == 0); + EXPECT_STREQ(names[10 * (i / 10)], ref.refname); + + block_iter_close(&it); + } + + reftable_record_release(&rec); + reftable_block_done(&br.block); + strbuf_release(&want); + for (i = 0; i < N; i++) { + reftable_free(names[i]); + } +} + +int block_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_block_read_write); + return 0; +} diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 09d4b83ef9b..c9deeaf08c7 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -4,6 +4,7 @@ int cmd__reftable(int argc, const char **argv) { basics_test_main(argc, argv); + block_test_main(argc, argv); record_test_main(argc, argv); return 0; } From patchwork Thu Sep 9 18:47:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1B5BC433FE for ; Thu, 9 Sep 2021 18:48:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C79BE61100 for ; Thu, 9 Sep 2021 18:48:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245729AbhIIStv (ORCPT ); Thu, 9 Sep 2021 14:49:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245014AbhIIStI (ORCPT ); Thu, 9 Sep 2021 14:49:08 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CCC9C0613C1 for ; Thu, 9 Sep 2021 11:47:53 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id d6so3940778wrc.11 for ; Thu, 09 Sep 2021 11:47:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=gnbFrauqPj2CcoiVdgZ8s35rgWABFI51ejPN6txeGCE=; b=C0yFvJAy+4Z9qsFH8vuzLVN0jLTSVBCzkXTfO0F4xh6UrwwvWZzWaQcNsHMcIJymtS mFwyFAsDBP/0duDg8Q2s522XMgwufNSmsoGBvuXSOREHwCbT4EdAS4vdQtsXrvRjVhb+ g7g6ENPgK9GsNxzOcE5wBx1oX7ErGQdXxSi5Sspj59qd3DMZ/DryuCZJCQJUgLbW+JFr 9aSbOcYxEvvfNCJ8QxxevY3elfOGnTl9aFD27v+JWNwlhl5DqCSFwg+3FsksJl8l3GP2 0ej/7yP5rPHD0aREoUEa6rHweqZKcARzEEPTyi5ka8/VDNcALfdGDuPt7Cua6IxKhfqQ crSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=gnbFrauqPj2CcoiVdgZ8s35rgWABFI51ejPN6txeGCE=; b=n8O7nJHSS9ENseVOcdnVV8WIlq4SQ1aV+/n5O6tiFBNMtpwoi6aJsOxpuWh4vhW4RF wvKA2goDyCpQNtzYgr6JVM97YBBaoBge2MrRxsAZ0yck2g1arfbXe4CWzUA8CfWi2DsX XcR71TJ7Twmk18ZKiSkzqwSZdGZ5CUb1OalRjrgBf8Zrf68kKCrzv1NSsTQrqvK5rR1E 6lItOpH4FBH0OIVIP4GeMfm353nfUSH3xl/xIqt1Sj2Jkmbc2wJcbq7A9+qmXMreHTXT iPnR67D1S3xhn+SRPumu8zxistb9YInpZ0uXGZtzmHs7YWlTTNVeQYxQ/MISOjExlG3a A0/w== X-Gm-Message-State: AOAM532J/ZrJ82bC0zApqPI7nKUszc4OrjDrRil2K0T/w/TaLEoc/l4Q zluL/On1k4nHCM+R+r5ytvzcd+a9qT0= X-Google-Smtp-Source: ABdhPJwc5qa/q86I33vunL8da15T+51A1ZqM/Ihbb+fQZZf62GlRb+sTgfNyqc6oewKP+Dlofj+CWg== X-Received: by 2002:a5d:6781:: with SMTP id v1mr5377101wru.249.1631213271751; Thu, 09 Sep 2021 11:47:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d129sm2511233wmd.23.2021.09.09.11.47.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:51 -0700 (PDT) Message-Id: <8d2e8be5795c9e32726fe519b1c1979a581e3a44.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:34 +0000 Subject: [PATCH v2 09/19] reftable: a generic binary tree implementation Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The reftable format includes support for an (OID => ref) map. This map can speed up visibility and reachability checks. In particular, various operations along the fetch/push path within Gerrit have ben sped up by using this structure. The map is constructed with help of a binary tree. Object IDs are hashes, so they are uniformly distributed. Hence, the tree does not attempt forced rebalancing. Signed-off-by: Han-Wen Nienhuys --- Makefile | 4 ++- reftable/tree.c | 63 ++++++++++++++++++++++++++++++++++++++++ reftable/tree.h | 34 ++++++++++++++++++++++ reftable/tree_test.c | 61 ++++++++++++++++++++++++++++++++++++++ t/helper/test-reftable.c | 1 + 5 files changed, 162 insertions(+), 1 deletion(-) create mode 100644 reftable/tree.c create mode 100644 reftable/tree.h create mode 100644 reftable/tree_test.c diff --git a/Makefile b/Makefile index 14a0a1f7643..157ff1edba8 100644 --- a/Makefile +++ b/Makefile @@ -2462,11 +2462,13 @@ REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o REFTABLE_OBJS += reftable/record.o +REFTABLE_OBJS += reftable/tree.o +REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o -REFTABLE_TEST_OBJS += reftable/basics_test.o +REFTABLE_TEST_OBJS += reftable/tree_test.o TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS)) diff --git a/reftable/tree.c b/reftable/tree.c new file mode 100644 index 00000000000..82db7995dd6 --- /dev/null +++ b/reftable/tree.c @@ -0,0 +1,63 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "tree.h" + +#include "basics.h" +#include "system.h" + +struct tree_node *tree_search(void *key, struct tree_node **rootp, + int (*compare)(const void *, const void *), + int insert) +{ + int res; + if (*rootp == NULL) { + if (!insert) { + return NULL; + } else { + struct tree_node *n = + reftable_calloc(sizeof(struct tree_node)); + n->key = key; + *rootp = n; + return *rootp; + } + } + + res = compare(key, (*rootp)->key); + if (res < 0) + return tree_search(key, &(*rootp)->left, compare, insert); + else if (res > 0) + return tree_search(key, &(*rootp)->right, compare, insert); + return *rootp; +} + +void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key), + void *arg) +{ + if (t->left) { + infix_walk(t->left, action, arg); + } + action(arg, t->key); + if (t->right) { + infix_walk(t->right, action, arg); + } +} + +void tree_free(struct tree_node *t) +{ + if (t == NULL) { + return; + } + if (t->left) { + tree_free(t->left); + } + if (t->right) { + tree_free(t->right); + } + reftable_free(t); +} diff --git a/reftable/tree.h b/reftable/tree.h new file mode 100644 index 00000000000..fbdd002e23a --- /dev/null +++ b/reftable/tree.h @@ -0,0 +1,34 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef TREE_H +#define TREE_H + +/* tree_node is a generic binary search tree. */ +struct tree_node { + void *key; + struct tree_node *left, *right; +}; + +/* looks for `key` in `rootp` using `compare` as comparison function. If insert + * is set, insert the key if it's not found. Else, return NULL. + */ +struct tree_node *tree_search(void *key, struct tree_node **rootp, + int (*compare)(const void *, const void *), + int insert); + +/* performs an infix walk of the tree. */ +void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key), + void *arg); + +/* + * deallocates the tree nodes recursively. Keys should be deallocated separately + * by walking over the tree. */ +void tree_free(struct tree_node *t); + +#endif diff --git a/reftable/tree_test.c b/reftable/tree_test.c new file mode 100644 index 00000000000..cbff1255886 --- /dev/null +++ b/reftable/tree_test.c @@ -0,0 +1,61 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "tree.h" + +#include "basics.h" +#include "record.h" +#include "test_framework.h" +#include "reftable-tests.h" + +static int test_compare(const void *a, const void *b) +{ + return (char *)a - (char *)b; +} + +struct curry { + void *last; +}; + +static void check_increasing(void *arg, void *key) +{ + struct curry *c = arg; + if (c->last) { + EXPECT(test_compare(c->last, key) < 0); + } + c->last = key; +} + +static void test_tree(void) +{ + struct tree_node *root = NULL; + + void *values[11] = { NULL }; + struct tree_node *nodes[11] = { NULL }; + int i = 1; + struct curry c = { NULL }; + do { + nodes[i] = tree_search(values + i, &root, &test_compare, 1); + i = (i * 7) % 11; + } while (i != 1); + + for (i = 1; i < ARRAY_SIZE(nodes); i++) { + EXPECT(values + i == nodes[i]->key); + EXPECT(nodes[i] == + tree_search(values + i, &root, &test_compare, 0)); + } + + infix_walk(root, check_increasing, &c); + tree_free(root); +} + +int tree_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_tree); + return 0; +} diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index c9deeaf08c7..050551fa698 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -6,5 +6,6 @@ int cmd__reftable(int argc, const char **argv) basics_test_main(argc, argv); block_test_main(argc, argv); record_test_main(argc, argv); + tree_test_main(argc, argv); return 0; } From patchwork Thu Sep 9 18:47:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79373C433F5 for ; Thu, 9 Sep 2021 18:48:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65EB0610FF for ; Thu, 9 Sep 2021 18:48:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243659AbhIIStx (ORCPT ); Thu, 9 Sep 2021 14:49:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245161AbhIIStJ (ORCPT ); Thu, 9 Sep 2021 14:49:09 -0400 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05462C0613DF for ; Thu, 9 Sep 2021 11:47:54 -0700 (PDT) Received: by mail-wm1-x333.google.com with SMTP id j17-20020a05600c1c1100b002e754875260so2126029wms.4 for ; Thu, 09 Sep 2021 11:47:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=1zZFRzTH7QA87ef0F5ZEw/+LIJEHVWGd0e0nv684iWE=; b=jAifTZyY+TcfBi5OyL3DDWn9mEvJ5byYN+hnuPsc8jR8G3K1gywSXAnysnL7AkSUnc Iq89hmixhcub+pvS69gLPp5JzVhPPwgOOTuAWaAxAFfxdBQyOIdQ2JWmwklmQgj++ShS A/xdjAm+reCXwIdAeFYkLFkYrD1KYLaxGSGD4gAkiPHdfsFL7ZzH7sPytnLIFPqZRPos 1w0SOnut1pL68yGJdgZ6A9OTT37cZ+t9qxmT9SXnSzEGUgktA0sc7ETWN5ahjqEk5ckL NmlBj1z/iKh/M+lvGHZ3d5wKLJRPx3gksENnFb7RLw6zwaj4B9MXSWlIr1WUohE/eV6w +sPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=1zZFRzTH7QA87ef0F5ZEw/+LIJEHVWGd0e0nv684iWE=; b=753tBbD+3XPmOU4lzhGEfABhVRBeQOOejvyOtXyrhxMpKMu9zZqSXPhTxakkf/2ZYe O1zWhl8HAfBt1uBlqCtJoVnv1iPxyjHaehRsIq/mSmktxfv3Xj/B+95wWoehRzwM8877 N0leQCe0ZGu6vPhfn3jVeAsI82W1Q51YMh9kJ/NH/k/TCmNGKxRePN5eIvL7hsJpqwGj i8l0k+fMXKvaSHEaqXRPaYPiBmeZnfdEJk16U0U/OOKzMTYD+qXv+09uwxj2ivuPoGnI VMPxj8CNejddY0Llp7EGAwQ3EboxYkb/fOf7TaEA1bjQiNVTm0QzoOUW9H46UAyxLVBU Waww== X-Gm-Message-State: AOAM533QxOFQtxj4g91cJpzTZ/f85ZMSEIMc9U4YMf6yIMVlnl6AJ4W5 8bK9k6Hunxzjl07F6UB5vgGmYofEx5M= X-Google-Smtp-Source: ABdhPJy/Yxh8SlQBIHhmISTbTWsLscVwYnIop9jCJ2uWNqbKjKlYVWerse4m2JSpvZmh2yjSJpLU6w== X-Received: by 2002:a1c:984c:: with SMTP id a73mr4400935wme.171.1631213272361; Thu, 09 Sep 2021 11:47:52 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l7sm2097836wmp.48.2021.09.09.11.47.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:52 -0700 (PDT) Message-Id: <965af6ec06563dd5eb82d0d3c178d5b4778a3792.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:35 +0000 Subject: [PATCH v2 10/19] reftable: write reftable files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + reftable/reftable-writer.h | 148 ++++++++ reftable/writer.c | 690 +++++++++++++++++++++++++++++++++++++ reftable/writer.h | 50 +++ 4 files changed, 889 insertions(+) create mode 100644 reftable/reftable-writer.h create mode 100644 reftable/writer.c create mode 100644 reftable/writer.h diff --git a/Makefile b/Makefile index 157ff1edba8..de5c4026201 100644 --- a/Makefile +++ b/Makefile @@ -2463,6 +2463,7 @@ REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o REFTABLE_OBJS += reftable/record.o REFTABLE_OBJS += reftable/tree.o +REFTABLE_OBJS += reftable/writer.o REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h new file mode 100644 index 00000000000..af36462ced5 --- /dev/null +++ b/reftable/reftable-writer.h @@ -0,0 +1,148 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_WRITER_H +#define REFTABLE_WRITER_H + +#include "reftable-record.h" + +#include +#include /* ssize_t */ + +/* Writing single reftables */ + +/* reftable_write_options sets options for writing a single reftable. */ +struct reftable_write_options { + /* boolean: do not pad out blocks to block size. */ + unsigned unpadded : 1; + + /* the blocksize. Should be less than 2^24. */ + uint32_t block_size; + + /* boolean: do not generate a SHA1 => ref index. */ + unsigned skip_index_objects : 1; + + /* how often to write complete keys in each block. */ + int restart_interval; + + /* 4-byte identifier ("sha1", "s256") of the hash. + * Defaults to SHA1 if unset + */ + uint32_t hash_id; + + /* boolean: do not check ref names for validity or dir/file conflicts. + */ + unsigned skip_name_check : 1; + + /* boolean: copy log messages exactly. If unset, check that the message + * is a single line, and add '\n' if missing. + */ + unsigned exact_log_message : 1; +}; + +/* reftable_block_stats holds statistics for a single block type */ +struct reftable_block_stats { + /* total number of entries written */ + int entries; + /* total number of key restarts */ + int restarts; + /* total number of blocks */ + int blocks; + /* total number of index blocks */ + int index_blocks; + /* depth of the index */ + int max_index_level; + + /* offset of the first block for this type */ + uint64_t offset; + /* offset of the top level index block for this type, or 0 if not + * present */ + uint64_t index_offset; +}; + +/* stats holds overall statistics for a single reftable */ +struct reftable_stats { + /* total number of blocks written. */ + int blocks; + /* stats for ref data */ + struct reftable_block_stats ref_stats; + /* stats for the SHA1 to ref map. */ + struct reftable_block_stats obj_stats; + /* stats for index blocks */ + struct reftable_block_stats idx_stats; + /* stats for log blocks */ + struct reftable_block_stats log_stats; + + /* disambiguation length of shortened object IDs. */ + int object_id_len; +}; + +/* reftable_new_writer creates a new writer */ +struct reftable_writer * +reftable_new_writer(ssize_t (*writer_func)(void *, const void *, size_t), + void *writer_arg, struct reftable_write_options *opts); + +/* Set the range of update indices for the records we will add. When writing a + table into a stack, the min should be at least + reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. + + For transactional updates to a stack, typically min==max, and the + update_index can be obtained by inspeciting the stack. When converting an + existing ref database into a single reftable, this would be a range of + update-index timestamps. + */ +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, + uint64_t max); + +/* + Add a reftable_ref_record. The record should have names that come after + already added records. + + The update_index must be within the limits set by + reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned. It is an + REFTABLE_API_ERROR error to write a ref record after a log record. +*/ +int reftable_writer_add_ref(struct reftable_writer *w, + struct reftable_ref_record *ref); + +/* + Convenience function to add multiple reftable_ref_records; the function sorts + the records before adding them, reordering the records array passed in. +*/ +int reftable_writer_add_refs(struct reftable_writer *w, + struct reftable_ref_record *refs, int n); + +/* + adds reftable_log_records. Log records are keyed by (refname, decreasing + update_index). The key for the record added must come after the already added + log records. +*/ +int reftable_writer_add_log(struct reftable_writer *w, + struct reftable_log_record *log); + +/* + Convenience function to add multiple reftable_log_records; the function sorts + the records before adding them, reordering records array passed in. +*/ +int reftable_writer_add_logs(struct reftable_writer *w, + struct reftable_log_record *logs, int n); + +/* reftable_writer_close finalizes the reftable. The writer is retained so + * statistics can be inspected. */ +int reftable_writer_close(struct reftable_writer *w); + +/* writer_stats returns the statistics on the reftable being written. + + This struct becomes invalid when the writer is freed. + */ +const struct reftable_stats *writer_stats(struct reftable_writer *w); + +/* reftable_writer_free deallocates memory for the writer */ +void reftable_writer_free(struct reftable_writer *w); + +#endif diff --git a/reftable/writer.c b/reftable/writer.c new file mode 100644 index 00000000000..3ca721e9f64 --- /dev/null +++ b/reftable/writer.c @@ -0,0 +1,690 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "writer.h" + +#include "system.h" + +#include "block.h" +#include "constants.h" +#include "record.h" +#include "tree.h" +#include "reftable-error.h" + +/* finishes a block, and writes it to storage */ +static int writer_flush_block(struct reftable_writer *w); + +/* deallocates memory related to the index */ +static void writer_clear_index(struct reftable_writer *w); + +/* finishes writing a 'r' (refs) or 'g' (reflogs) section */ +static int writer_finish_public_section(struct reftable_writer *w); + +static struct reftable_block_stats * +writer_reftable_block_stats(struct reftable_writer *w, uint8_t typ) +{ + switch (typ) { + case 'r': + return &w->stats.ref_stats; + case 'o': + return &w->stats.obj_stats; + case 'i': + return &w->stats.idx_stats; + case 'g': + return &w->stats.log_stats; + } + abort(); + return NULL; +} + +/* write data, queuing the padding for the next write. Returns negative for + * error. */ +static int padded_write(struct reftable_writer *w, uint8_t *data, size_t len, + int padding) +{ + int n = 0; + if (w->pending_padding > 0) { + uint8_t *zeroed = reftable_calloc(w->pending_padding); + int n = w->write(w->write_arg, zeroed, w->pending_padding); + if (n < 0) + return n; + + w->pending_padding = 0; + reftable_free(zeroed); + } + + w->pending_padding = padding; + n = w->write(w->write_arg, data, len); + if (n < 0) + return n; + n += padding; + return 0; +} + +static void options_set_defaults(struct reftable_write_options *opts) +{ + if (opts->restart_interval == 0) { + opts->restart_interval = 16; + } + + if (opts->hash_id == 0) { + opts->hash_id = GIT_SHA1_FORMAT_ID; + } + if (opts->block_size == 0) { + opts->block_size = DEFAULT_BLOCK_SIZE; + } +} + +static int writer_version(struct reftable_writer *w) +{ + return (w->opts.hash_id == 0 || w->opts.hash_id == GIT_SHA1_FORMAT_ID) ? + 1 : + 2; +} + +static int writer_write_header(struct reftable_writer *w, uint8_t *dest) +{ + memcpy(dest, "REFT", 4); + + dest[4] = writer_version(w); + + put_be24(dest + 5, w->opts.block_size); + put_be64(dest + 8, w->min_update_index); + put_be64(dest + 16, w->max_update_index); + if (writer_version(w) == 2) { + put_be32(dest + 24, w->opts.hash_id); + } + return header_size(writer_version(w)); +} + +static void writer_reinit_block_writer(struct reftable_writer *w, uint8_t typ) +{ + int block_start = 0; + if (w->next == 0) { + block_start = header_size(writer_version(w)); + } + + strbuf_release(&w->last_key); + block_writer_init(&w->block_writer_data, typ, w->block, + w->opts.block_size, block_start, + hash_size(w->opts.hash_id)); + w->block_writer = &w->block_writer_data; + w->block_writer->restart_interval = w->opts.restart_interval; +} + +static struct strbuf reftable_empty_strbuf = STRBUF_INIT; + +struct reftable_writer * +reftable_new_writer(ssize_t (*writer_func)(void *, const void *, size_t), + void *writer_arg, struct reftable_write_options *opts) +{ + struct reftable_writer *wp = + reftable_calloc(sizeof(struct reftable_writer)); + strbuf_init(&wp->block_writer_data.last_key, 0); + options_set_defaults(opts); + if (opts->block_size >= (1 << 24)) { + /* TODO - error return? */ + abort(); + } + wp->last_key = reftable_empty_strbuf; + wp->block = reftable_calloc(opts->block_size); + wp->write = writer_func; + wp->write_arg = writer_arg; + wp->opts = *opts; + writer_reinit_block_writer(wp, BLOCK_TYPE_REF); + + return wp; +} + +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, + uint64_t max) +{ + w->min_update_index = min; + w->max_update_index = max; +} + +void reftable_writer_free(struct reftable_writer *w) +{ + reftable_free(w->block); + reftable_free(w); +} + +struct obj_index_tree_node { + struct strbuf hash; + uint64_t *offsets; + size_t offset_len; + size_t offset_cap; +}; + +#define OBJ_INDEX_TREE_NODE_INIT \ + { \ + .hash = STRBUF_INIT \ + } + +static int obj_index_tree_node_compare(const void *a, const void *b) +{ + return strbuf_cmp(&((const struct obj_index_tree_node *)a)->hash, + &((const struct obj_index_tree_node *)b)->hash); +} + +static void writer_index_hash(struct reftable_writer *w, struct strbuf *hash) +{ + uint64_t off = w->next; + + struct obj_index_tree_node want = { .hash = *hash }; + + struct tree_node *node = tree_search(&want, &w->obj_index_tree, + &obj_index_tree_node_compare, 0); + struct obj_index_tree_node *key = NULL; + if (node == NULL) { + struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT; + key = reftable_malloc(sizeof(struct obj_index_tree_node)); + *key = empty; + + strbuf_reset(&key->hash); + strbuf_addbuf(&key->hash, hash); + tree_search((void *)key, &w->obj_index_tree, + &obj_index_tree_node_compare, 1); + } else { + key = node->key; + } + + if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) { + return; + } + + if (key->offset_len == key->offset_cap) { + key->offset_cap = 2 * key->offset_cap + 1; + key->offsets = reftable_realloc( + key->offsets, sizeof(uint64_t) * key->offset_cap); + } + + key->offsets[key->offset_len++] = off; +} + +static int writer_add_record(struct reftable_writer *w, + struct reftable_record *rec) +{ + struct strbuf key = STRBUF_INIT; + int err = -1; + reftable_record_key(rec, &key); + if (strbuf_cmp(&w->last_key, &key) >= 0) { + err = REFTABLE_API_ERROR; + goto done; + } + + strbuf_reset(&w->last_key); + strbuf_addbuf(&w->last_key, &key); + if (w->block_writer == NULL) { + writer_reinit_block_writer(w, reftable_record_type(rec)); + } + + assert(block_writer_type(w->block_writer) == reftable_record_type(rec)); + + if (block_writer_add(w->block_writer, rec) == 0) { + err = 0; + goto done; + } + + err = writer_flush_block(w); + if (err < 0) { + goto done; + } + + writer_reinit_block_writer(w, reftable_record_type(rec)); + err = block_writer_add(w->block_writer, rec); + if (err < 0) { + goto done; + } + + err = 0; +done: + strbuf_release(&key); + return err; +} + +int reftable_writer_add_ref(struct reftable_writer *w, + struct reftable_ref_record *ref) +{ + struct reftable_record rec = { NULL }; + struct reftable_ref_record copy = *ref; + int err = 0; + + if (ref->refname == NULL) + return REFTABLE_API_ERROR; + if (ref->update_index < w->min_update_index || + ref->update_index > w->max_update_index) + return REFTABLE_API_ERROR; + + reftable_record_from_ref(&rec, ©); + copy.update_index -= w->min_update_index; + + err = writer_add_record(w, &rec); + if (err < 0) + return err; + + if (!w->opts.skip_index_objects && reftable_ref_record_val1(ref)) { + struct strbuf h = STRBUF_INIT; + strbuf_add(&h, (char *)reftable_ref_record_val1(ref), + hash_size(w->opts.hash_id)); + writer_index_hash(w, &h); + strbuf_release(&h); + } + + if (!w->opts.skip_index_objects && reftable_ref_record_val2(ref)) { + struct strbuf h = STRBUF_INIT; + strbuf_add(&h, reftable_ref_record_val2(ref), + hash_size(w->opts.hash_id)); + writer_index_hash(w, &h); + strbuf_release(&h); + } + return 0; +} + +int reftable_writer_add_refs(struct reftable_writer *w, + struct reftable_ref_record *refs, int n) +{ + int err = 0; + int i = 0; + QSORT(refs, n, reftable_ref_record_compare_name); + for (i = 0; err == 0 && i < n; i++) { + err = reftable_writer_add_ref(w, &refs[i]); + } + return err; +} + +static int reftable_writer_add_log_verbatim(struct reftable_writer *w, + struct reftable_log_record *log) +{ + struct reftable_record rec = { NULL }; + if (w->block_writer && + block_writer_type(w->block_writer) == BLOCK_TYPE_REF) { + int err = writer_finish_public_section(w); + if (err < 0) + return err; + } + + w->next -= w->pending_padding; + w->pending_padding = 0; + + reftable_record_from_log(&rec, log); + return writer_add_record(w, &rec); +} + +int reftable_writer_add_log(struct reftable_writer *w, + struct reftable_log_record *log) +{ + char *input_log_message = NULL; + struct strbuf cleaned_message = STRBUF_INIT; + int err = 0; + + if (log->value_type == REFTABLE_LOG_DELETION) + return reftable_writer_add_log_verbatim(w, log); + + if (log->refname == NULL) + return REFTABLE_API_ERROR; + + input_log_message = log->value.update.message; + if (!w->opts.exact_log_message && log->value.update.message) { + strbuf_addstr(&cleaned_message, log->value.update.message); + while (cleaned_message.len && + cleaned_message.buf[cleaned_message.len - 1] == '\n') + strbuf_setlen(&cleaned_message, + cleaned_message.len - 1); + if (strchr(cleaned_message.buf, '\n')) { + /* multiple lines not allowed. */ + err = REFTABLE_API_ERROR; + goto done; + } + strbuf_addstr(&cleaned_message, "\n"); + log->value.update.message = cleaned_message.buf; + } + + err = reftable_writer_add_log_verbatim(w, log); + log->value.update.message = input_log_message; +done: + strbuf_release(&cleaned_message); + return err; +} + +int reftable_writer_add_logs(struct reftable_writer *w, + struct reftable_log_record *logs, int n) +{ + int err = 0; + int i = 0; + QSORT(logs, n, reftable_log_record_compare_key); + + for (i = 0; err == 0 && i < n; i++) { + err = reftable_writer_add_log(w, &logs[i]); + } + return err; +} + +static int writer_finish_section(struct reftable_writer *w) +{ + uint8_t typ = block_writer_type(w->block_writer); + uint64_t index_start = 0; + int max_level = 0; + int threshold = w->opts.unpadded ? 1 : 3; + int before_blocks = w->stats.idx_stats.blocks; + int err = writer_flush_block(w); + int i = 0; + struct reftable_block_stats *bstats = NULL; + if (err < 0) + return err; + + while (w->index_len > threshold) { + struct reftable_index_record *idx = NULL; + int idx_len = 0; + + max_level++; + index_start = w->next; + writer_reinit_block_writer(w, BLOCK_TYPE_INDEX); + + idx = w->index; + idx_len = w->index_len; + + w->index = NULL; + w->index_len = 0; + w->index_cap = 0; + for (i = 0; i < idx_len; i++) { + struct reftable_record rec = { NULL }; + reftable_record_from_index(&rec, idx + i); + if (block_writer_add(w->block_writer, &rec) == 0) { + continue; + } + + err = writer_flush_block(w); + if (err < 0) + return err; + + writer_reinit_block_writer(w, BLOCK_TYPE_INDEX); + + err = block_writer_add(w->block_writer, &rec); + if (err != 0) { + /* write into fresh block should always succeed + */ + abort(); + } + } + for (i = 0; i < idx_len; i++) { + strbuf_release(&idx[i].last_key); + } + reftable_free(idx); + } + + writer_clear_index(w); + + err = writer_flush_block(w); + if (err < 0) + return err; + + bstats = writer_reftable_block_stats(w, typ); + bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks; + bstats->index_offset = index_start; + bstats->max_index_level = max_level; + + /* Reinit lastKey, as the next section can start with any key. */ + w->last_key.len = 0; + + return 0; +} + +struct common_prefix_arg { + struct strbuf *last; + int max; +}; + +static void update_common(void *void_arg, void *key) +{ + struct common_prefix_arg *arg = void_arg; + struct obj_index_tree_node *entry = key; + if (arg->last) { + int n = common_prefix_size(&entry->hash, arg->last); + if (n > arg->max) { + arg->max = n; + } + } + arg->last = &entry->hash; +} + +struct write_record_arg { + struct reftable_writer *w; + int err; +}; + +static void write_object_record(void *void_arg, void *key) +{ + struct write_record_arg *arg = void_arg; + struct obj_index_tree_node *entry = key; + struct reftable_obj_record obj_rec = { + .hash_prefix = (uint8_t *)entry->hash.buf, + .hash_prefix_len = arg->w->stats.object_id_len, + .offsets = entry->offsets, + .offset_len = entry->offset_len, + }; + struct reftable_record rec = { NULL }; + if (arg->err < 0) + goto done; + + reftable_record_from_obj(&rec, &obj_rec); + arg->err = block_writer_add(arg->w->block_writer, &rec); + if (arg->err == 0) + goto done; + + arg->err = writer_flush_block(arg->w); + if (arg->err < 0) + goto done; + + writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ); + arg->err = block_writer_add(arg->w->block_writer, &rec); + if (arg->err == 0) + goto done; + obj_rec.offset_len = 0; + arg->err = block_writer_add(arg->w->block_writer, &rec); + + /* Should be able to write into a fresh block. */ + assert(arg->err == 0); + +done:; +} + +static void object_record_free(void *void_arg, void *key) +{ + struct obj_index_tree_node *entry = key; + + FREE_AND_NULL(entry->offsets); + strbuf_release(&entry->hash); + reftable_free(entry); +} + +static int writer_dump_object_index(struct reftable_writer *w) +{ + struct write_record_arg closure = { .w = w }; + struct common_prefix_arg common = { NULL }; + if (w->obj_index_tree) { + infix_walk(w->obj_index_tree, &update_common, &common); + } + w->stats.object_id_len = common.max + 1; + + writer_reinit_block_writer(w, BLOCK_TYPE_OBJ); + + if (w->obj_index_tree) { + infix_walk(w->obj_index_tree, &write_object_record, &closure); + } + + if (closure.err < 0) + return closure.err; + return writer_finish_section(w); +} + +static int writer_finish_public_section(struct reftable_writer *w) +{ + uint8_t typ = 0; + int err = 0; + + if (w->block_writer == NULL) + return 0; + + typ = block_writer_type(w->block_writer); + err = writer_finish_section(w); + if (err < 0) + return err; + if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects && + w->stats.ref_stats.index_blocks > 0) { + err = writer_dump_object_index(w); + if (err < 0) + return err; + } + + if (w->obj_index_tree) { + infix_walk(w->obj_index_tree, &object_record_free, NULL); + tree_free(w->obj_index_tree); + w->obj_index_tree = NULL; + } + + w->block_writer = NULL; + return 0; +} + +int reftable_writer_close(struct reftable_writer *w) +{ + uint8_t footer[72]; + uint8_t *p = footer; + int err = writer_finish_public_section(w); + int empty_table = w->next == 0; + if (err != 0) + goto done; + w->pending_padding = 0; + if (empty_table) { + /* Empty tables need a header anyway. */ + uint8_t header[28]; + int n = writer_write_header(w, header); + err = padded_write(w, header, n, 0); + if (err < 0) + goto done; + } + + p += writer_write_header(w, footer); + put_be64(p, w->stats.ref_stats.index_offset); + p += 8; + put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len); + p += 8; + put_be64(p, w->stats.obj_stats.index_offset); + p += 8; + + put_be64(p, w->stats.log_stats.offset); + p += 8; + put_be64(p, w->stats.log_stats.index_offset); + p += 8; + + put_be32(p, crc32(0, footer, p - footer)); + p += 4; + + err = padded_write(w, footer, footer_size(writer_version(w)), 0); + if (err < 0) + goto done; + + if (empty_table) { + err = REFTABLE_EMPTY_TABLE_ERROR; + goto done; + } + +done: + /* free up memory. */ + block_writer_release(&w->block_writer_data); + writer_clear_index(w); + strbuf_release(&w->last_key); + return err; +} + +static void writer_clear_index(struct reftable_writer *w) +{ + int i = 0; + for (i = 0; i < w->index_len; i++) { + strbuf_release(&w->index[i].last_key); + } + + FREE_AND_NULL(w->index); + w->index_len = 0; + w->index_cap = 0; +} + +static const int debug = 0; + +static int writer_flush_nonempty_block(struct reftable_writer *w) +{ + uint8_t typ = block_writer_type(w->block_writer); + struct reftable_block_stats *bstats = + writer_reftable_block_stats(w, typ); + uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0; + int raw_bytes = block_writer_finish(w->block_writer); + int padding = 0; + int err = 0; + struct reftable_index_record ir = { .last_key = STRBUF_INIT }; + if (raw_bytes < 0) + return raw_bytes; + + if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) { + padding = w->opts.block_size - raw_bytes; + } + + if (block_typ_off > 0) { + bstats->offset = block_typ_off; + } + + bstats->entries += w->block_writer->entries; + bstats->restarts += w->block_writer->restart_len; + bstats->blocks++; + w->stats.blocks++; + + if (debug) { + fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ, + w->next, raw_bytes, + get_be24(w->block + w->block_writer->header_off + 1)); + } + + if (w->next == 0) { + writer_write_header(w, w->block); + } + + err = padded_write(w, w->block, raw_bytes, padding); + if (err < 0) + return err; + + if (w->index_cap == w->index_len) { + w->index_cap = 2 * w->index_cap + 1; + w->index = reftable_realloc( + w->index, + sizeof(struct reftable_index_record) * w->index_cap); + } + + ir.offset = w->next; + strbuf_reset(&ir.last_key); + strbuf_addbuf(&ir.last_key, &w->block_writer->last_key); + w->index[w->index_len] = ir; + + w->index_len++; + w->next += padding + raw_bytes; + w->block_writer = NULL; + return 0; +} + +static int writer_flush_block(struct reftable_writer *w) +{ + if (w->block_writer == NULL) + return 0; + if (w->block_writer->entries == 0) + return 0; + return writer_flush_nonempty_block(w); +} + +const struct reftable_stats *writer_stats(struct reftable_writer *w) +{ + return &w->stats; +} diff --git a/reftable/writer.h b/reftable/writer.h new file mode 100644 index 00000000000..09b88673d97 --- /dev/null +++ b/reftable/writer.h @@ -0,0 +1,50 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef WRITER_H +#define WRITER_H + +#include "basics.h" +#include "block.h" +#include "tree.h" +#include "reftable-writer.h" + +struct reftable_writer { + ssize_t (*write)(void *, const void *, size_t); + void *write_arg; + int pending_padding; + struct strbuf last_key; + + /* offset of next block to write. */ + uint64_t next; + uint64_t min_update_index, max_update_index; + struct reftable_write_options opts; + + /* memory buffer for writing */ + uint8_t *block; + + /* writer for the current section. NULL or points to + * block_writer_data */ + struct block_writer *block_writer; + + struct block_writer block_writer_data; + + /* pending index records for the current section */ + struct reftable_index_record *index; + size_t index_len; + size_t index_cap; + + /* + * tree for use with tsearch; used to populate the 'o' inverse OID + * map */ + struct tree_node *obj_index_tree; + + struct reftable_stats stats; +}; + +#endif From patchwork Thu Sep 9 18:47:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8425CC433EF for ; Thu, 9 Sep 2021 18:48:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66D38610FF for ; Thu, 9 Sep 2021 18:48:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245054AbhIIStz (ORCPT ); Thu, 9 Sep 2021 14:49:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245175AbhIIStK (ORCPT ); Thu, 9 Sep 2021 14:49:10 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7483FC0613E1 for ; Thu, 9 Sep 2021 11:47:54 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id q11so3939137wrr.9 for ; Thu, 09 Sep 2021 11:47:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=kIMjQ3LksAJ4FsdkDFwrNF35n1GM4uxd0AJgPt0Xte4=; b=aKV2AKlQFnRQ7KDlXs9DpkhoeOYcdOqgY8ydFzDenEjC9zJbvtNrr6N8yZ0HbAxKgB NL5c4xl08qJueQUvBDJ3HEPKKbiCMYjsqmxmg84OzG6mFE5YAMdNn3CLoAV2nBxqOA12 SF3rz6SQKw092Zqm91uth2UwKa8HpeytfZxc3XsyyprCgMNbeJ3Hw6G5hm/ckn/Jppck Dyoh49WfWhfVkuW7PestPQ7nHFc7J6APJzw+OTUXeXqCqaHTEB4kPIn4NCAryfzwFb5B rJ8EA+u9AamO7xuehQE2muG1pFMoT50H3/dRTZFrA1nIMKbduyt+ErSu9ZvQ5CrfkS3C Qaew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=kIMjQ3LksAJ4FsdkDFwrNF35n1GM4uxd0AJgPt0Xte4=; b=cdx1wUldygeTcVONseFzm1sGb9iCh8PyX/eGLKucxMJxG4t/P4SO10Opy+901pPdcD SOuR3e0Zh9FRB6yzfc0zsqYmZjFG7W9JoihBsSkwNqdyPr0MWTFoSgIgJoYnc5yGum3Y buQoo+Jz6WcEFLCiZxYkmBzdveReAl8v5zdceapLW49evhVhRHisQX/ajtG1zWFnWb+w /kXAR44cBqdKkZpVLCEDm89fayZSBlJkBoqmbXbeyaxId2iZIbUlk6oBMc8q0lE0HXeu D6dPSXVuu9GHzXKd4VPi/pfJgWUwr0VJeyn6+T/UZAae1o+ffTlL3YrmBsMYevRPJ7j5 Xf6g== X-Gm-Message-State: AOAM5317CT3FANZhle4zpGiAvmI1KqaFnVPAz9MM/4wdhW5ApkRF/x6M vkfLzfhQSYIB/USV/AM0t4zEBEhszs0= X-Google-Smtp-Source: ABdhPJxVaamI5cdVxE1LZdjk9lVuZrQHKAjRTVMt22vb82rEqJHYlLOrZLD8Id/PPE7nUHtHfBvGIg== X-Received: by 2002:a05:6000:22d:: with SMTP id l13mr5383372wrz.410.1631213273001; Thu, 09 Sep 2021 11:47:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y4sm2285035wmi.22.2021.09.09.11.47.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:52 -0700 (PDT) Message-Id: <4e259380fda937e55140fe1feff07bb69061646b.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:36 +0000 Subject: [PATCH v2 11/19] reftable: generic interface to tables Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys Signed-off-by: Han-Wen Nienhuys --- Makefile | 3 + reftable/generic.c | 169 +++++++++++++++++++++++++++++++++++ reftable/generic.h | 32 +++++++ reftable/reftable-generic.h | 47 ++++++++++ reftable/reftable-iterator.h | 39 ++++++++ reftable/reftable.c | 115 ++++++++++++++++++++++++ 6 files changed, 405 insertions(+) create mode 100644 reftable/generic.c create mode 100644 reftable/generic.h create mode 100644 reftable/reftable-generic.h create mode 100644 reftable/reftable-iterator.h create mode 100644 reftable/reftable.c diff --git a/Makefile b/Makefile index de5c4026201..02cb06f82ec 100644 --- a/Makefile +++ b/Makefile @@ -2462,6 +2462,9 @@ REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/publicbasics.o REFTABLE_OBJS += reftable/record.o +REFTABLE_OBJS += reftable/refname.o +REFTABLE_OBJS += reftable/generic.o +REFTABLE_OBJS += reftable/stack.o REFTABLE_OBJS += reftable/tree.o REFTABLE_OBJS += reftable/writer.o diff --git a/reftable/generic.c b/reftable/generic.c new file mode 100644 index 00000000000..7a8a738d860 --- /dev/null +++ b/reftable/generic.c @@ -0,0 +1,169 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "basics.h" +#include "record.h" +#include "generic.h" +#include "reftable-iterator.h" +#include "reftable-generic.h" + +int reftable_table_seek_ref(struct reftable_table *tab, + struct reftable_iterator *it, const char *name) +{ + struct reftable_ref_record ref = { + .refname = (char *)name, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, &ref); + return tab->ops->seek_record(tab->table_arg, it, &rec); +} + +int reftable_table_seek_log(struct reftable_table *tab, + struct reftable_iterator *it, const char *name) +{ + struct reftable_log_record log = { + .refname = (char *)name, + .update_index = ~((uint64_t)0), + }; + struct reftable_record rec = { NULL }; + reftable_record_from_log(&rec, &log); + return tab->ops->seek_record(tab->table_arg, it, &rec); +} + +int reftable_table_read_ref(struct reftable_table *tab, const char *name, + struct reftable_ref_record *ref) +{ + struct reftable_iterator it = { NULL }; + int err = reftable_table_seek_ref(tab, &it, name); + if (err) + goto done; + + err = reftable_iterator_next_ref(&it, ref); + if (err) + goto done; + + if (strcmp(ref->refname, name) || + reftable_ref_record_is_deletion(ref)) { + reftable_ref_record_release(ref); + err = 1; + goto done; + } + +done: + reftable_iterator_destroy(&it); + return err; +} + +int reftable_table_print(struct reftable_table *tab) { + struct reftable_iterator it = { NULL }; + struct reftable_ref_record ref = { NULL }; + struct reftable_log_record log = { NULL }; + uint32_t hash_id = reftable_table_hash_id(tab); + int err = reftable_table_seek_ref(tab, &it, ""); + if (err < 0) { + return err; + } + + while (1) { + err = reftable_iterator_next_ref(&it, &ref); + if (err > 0) { + break; + } + if (err < 0) { + return err; + } + reftable_ref_record_print(&ref, hash_id); + } + reftable_iterator_destroy(&it); + reftable_ref_record_release(&ref); + + err = reftable_table_seek_log(tab, &it, ""); + if (err < 0) { + return err; + } + while (1) { + err = reftable_iterator_next_log(&it, &log); + if (err > 0) { + break; + } + if (err < 0) { + return err; + } + reftable_log_record_print(&log, hash_id); + } + reftable_iterator_destroy(&it); + reftable_log_record_release(&log); + return 0; +} + +uint64_t reftable_table_max_update_index(struct reftable_table *tab) +{ + return tab->ops->max_update_index(tab->table_arg); +} + +uint64_t reftable_table_min_update_index(struct reftable_table *tab) +{ + return tab->ops->min_update_index(tab->table_arg); +} + +uint32_t reftable_table_hash_id(struct reftable_table *tab) +{ + return tab->ops->hash_id(tab->table_arg); +} + +void reftable_iterator_destroy(struct reftable_iterator *it) +{ + if (!it->ops) { + return; + } + it->ops->close(it->iter_arg); + it->ops = NULL; + FREE_AND_NULL(it->iter_arg); +} + +int reftable_iterator_next_ref(struct reftable_iterator *it, + struct reftable_ref_record *ref) +{ + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, ref); + return iterator_next(it, &rec); +} + +int reftable_iterator_next_log(struct reftable_iterator *it, + struct reftable_log_record *log) +{ + struct reftable_record rec = { NULL }; + reftable_record_from_log(&rec, log); + return iterator_next(it, &rec); +} + +int iterator_next(struct reftable_iterator *it, struct reftable_record *rec) +{ + return it->ops->next(it->iter_arg, rec); +} + +static int empty_iterator_next(void *arg, struct reftable_record *rec) +{ + return 1; +} + +static void empty_iterator_close(void *arg) +{ +} + +static struct reftable_iterator_vtable empty_vtable = { + .next = &empty_iterator_next, + .close = &empty_iterator_close, +}; + +void iterator_set_empty(struct reftable_iterator *it) +{ + assert(!it->ops); + it->iter_arg = NULL; + it->ops = &empty_vtable; +} diff --git a/reftable/generic.h b/reftable/generic.h new file mode 100644 index 00000000000..98886a06402 --- /dev/null +++ b/reftable/generic.h @@ -0,0 +1,32 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef GENERIC_H +#define GENERIC_H + +#include "record.h" +#include "reftable-generic.h" + +/* generic interface to reftables */ +struct reftable_table_vtable { + int (*seek_record)(void *tab, struct reftable_iterator *it, + struct reftable_record *); + uint32_t (*hash_id)(void *tab); + uint64_t (*min_update_index)(void *tab); + uint64_t (*max_update_index)(void *tab); +}; + +struct reftable_iterator_vtable { + int (*next)(void *iter_arg, struct reftable_record *rec); + void (*close)(void *iter_arg); +}; + +void iterator_set_empty(struct reftable_iterator *it); +int iterator_next(struct reftable_iterator *it, struct reftable_record *rec); + +#endif diff --git a/reftable/reftable-generic.h b/reftable/reftable-generic.h new file mode 100644 index 00000000000..d239751a778 --- /dev/null +++ b/reftable/reftable-generic.h @@ -0,0 +1,47 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_GENERIC_H +#define REFTABLE_GENERIC_H + +#include "reftable-iterator.h" + +struct reftable_table_vtable; + +/* + * Provides a unified API for reading tables, either merged tables, or single + * readers. */ +struct reftable_table { + struct reftable_table_vtable *ops; + void *table_arg; +}; + +int reftable_table_seek_log(struct reftable_table *tab, + struct reftable_iterator *it, const char *name); + +int reftable_table_seek_ref(struct reftable_table *tab, + struct reftable_iterator *it, const char *name); + +/* returns the hash ID from a generic reftable_table */ +uint32_t reftable_table_hash_id(struct reftable_table *tab); + +/* returns the max update_index covered by this table. */ +uint64_t reftable_table_max_update_index(struct reftable_table *tab); + +/* returns the min update_index covered by this table. */ +uint64_t reftable_table_min_update_index(struct reftable_table *tab); + +/* convenience function to read a single ref. Returns < 0 for error, 0 + for success, and 1 if ref not found. */ +int reftable_table_read_ref(struct reftable_table *tab, const char *name, + struct reftable_ref_record *ref); + +/* dump table contents onto stdout for debugging */ +int reftable_table_print(struct reftable_table *tab); + +#endif diff --git a/reftable/reftable-iterator.h b/reftable/reftable-iterator.h new file mode 100644 index 00000000000..d3eee7af357 --- /dev/null +++ b/reftable/reftable-iterator.h @@ -0,0 +1,39 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_ITERATOR_H +#define REFTABLE_ITERATOR_H + +#include "reftable-record.h" + +struct reftable_iterator_vtable; + +/* iterator is the generic interface for walking over data stored in a + * reftable. + */ +struct reftable_iterator { + struct reftable_iterator_vtable *ops; + void *iter_arg; +}; + +/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0: + * end of iteration. + */ +int reftable_iterator_next_ref(struct reftable_iterator *it, + struct reftable_ref_record *ref); + +/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0: + * end of iteration. + */ +int reftable_iterator_next_log(struct reftable_iterator *it, + struct reftable_log_record *log); + +/* releases resources associated with an iterator. */ +void reftable_iterator_destroy(struct reftable_iterator *it); + +#endif diff --git a/reftable/reftable.c b/reftable/reftable.c new file mode 100644 index 00000000000..0e4607a7cd6 --- /dev/null +++ b/reftable/reftable.c @@ -0,0 +1,115 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "basics.h" +#include "record.h" +#include "generic.h" +#include "reftable-iterator.h" +#include "reftable-generic.h" + +int reftable_table_seek_ref(struct reftable_table *tab, + struct reftable_iterator *it, const char *name) +{ + struct reftable_ref_record ref = { + .refname = (char *)name, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, &ref); + return tab->ops->seek_record(tab->table_arg, it, &rec); +} + +int reftable_table_read_ref(struct reftable_table *tab, const char *name, + struct reftable_ref_record *ref) +{ + struct reftable_iterator it = { NULL }; + int err = reftable_table_seek_ref(tab, &it, name); + if (err) + goto done; + + err = reftable_iterator_next_ref(&it, ref); + if (err) + goto done; + + if (strcmp(ref->refname, name) || + reftable_ref_record_is_deletion(ref)) { + reftable_ref_record_release(ref); + err = 1; + goto done; + } + +done: + reftable_iterator_destroy(&it); + return err; +} + +uint64_t reftable_table_max_update_index(struct reftable_table *tab) +{ + return tab->ops->max_update_index(tab->table_arg); +} + +uint64_t reftable_table_min_update_index(struct reftable_table *tab) +{ + return tab->ops->min_update_index(tab->table_arg); +} + +uint32_t reftable_table_hash_id(struct reftable_table *tab) +{ + return tab->ops->hash_id(tab->table_arg); +} + +void reftable_iterator_destroy(struct reftable_iterator *it) +{ + if (!it->ops) { + return; + } + it->ops->close(it->iter_arg); + it->ops = NULL; + FREE_AND_NULL(it->iter_arg); +} + +int reftable_iterator_next_ref(struct reftable_iterator *it, + struct reftable_ref_record *ref) +{ + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, ref); + return iterator_next(it, &rec); +} + +int reftable_iterator_next_log(struct reftable_iterator *it, + struct reftable_log_record *log) +{ + struct reftable_record rec = { NULL }; + reftable_record_from_log(&rec, log); + return iterator_next(it, &rec); +} + +int iterator_next(struct reftable_iterator *it, struct reftable_record *rec) +{ + return it->ops->next(it->iter_arg, rec); +} + +static int empty_iterator_next(void *arg, struct reftable_record *rec) +{ + return 1; +} + +static void empty_iterator_close(void *arg) +{ +} + +static struct reftable_iterator_vtable empty_vtable = { + .next = &empty_iterator_next, + .close = &empty_iterator_close, +}; + +void iterator_set_empty(struct reftable_iterator *it) +{ + assert(!it->ops); + it->iter_arg = NULL; + it->ops = &empty_vtable; +} From patchwork Thu Sep 9 18:47:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 450A7C433FE for ; Thu, 9 Sep 2021 18:48:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27A7F610FF for ; Thu, 9 Sep 2021 18:48:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343593AbhIISt6 (ORCPT ); Thu, 9 Sep 2021 14:49:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245244AbhIIStK (ORCPT ); Thu, 9 Sep 2021 14:49:10 -0400 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62A23C0613E6 for ; Thu, 9 Sep 2021 11:47:55 -0700 (PDT) Received: by mail-wm1-x330.google.com with SMTP id l18-20020a05600c4f1200b002f8cf606262so2260467wmq.1 for ; Thu, 09 Sep 2021 11:47:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=WZ9SqMdrrdPwTe12Nmh3m7DNDyEeczBo6nMNDZJXCn4=; b=XGNspvPC0q2HSgaJHkx3DWSY+bmPXpB+kX2muEJYnqflYQpVNv2BvOyJAbGeBtDGMO Qam1+7usAgrYIaZHA1J4Hx53zRIDQJ00YyUjp/1ITMpU+ZmDG6VmzX2wb7tZpvYYDBPY J7H+YNCUAuWkGy50jtXA+ONwFnzvaieB/I1XkN6aN5aNdnp4WSo780xoeT9f/3cNx+mw JCRIbY614f9SsOcULhHMvdJ8KY7NlhkWRZLOfLPJHO2NrY1i6Wc+a8AtHvOG6H7SPK1i q9M+EdAXlJCA+qb6CsKN9wBTxix56nacRC8+j8hfIg+l+veZd9Ma6GzCl7fZA20CGaPF kTLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=WZ9SqMdrrdPwTe12Nmh3m7DNDyEeczBo6nMNDZJXCn4=; b=iRP9rIPPUqqkUryy/lpRfLu7Vchjt9rYdt63gzKuKrz7LShr2oXESTRIYco5pPqjSa CQVPeIN3ioUgo5hCJsaoxnWdJy0GJbe8EzvmtZRTpbAxcQtbtvWj0Gzs92lId3+3zqCh rldITzD1r06c8kNjIAQ9HASJBx/6kwgHtTmjgB9rBwyJ4lDKfMeP5TDb6Xw7COhwdY1t jM4Mb1gYoUBf8fcH+3iJtEymDGv4YGvTQKePLmrGfhYwfnAmp9UNEO0pDmmN59yU6jYE cBISVNCzEXoULHUBHX3jrFMwbLTUKfuC0xdlDYzLGM3gS9qDpn8k0zSjaYc09PTatTBZ MSCw== X-Gm-Message-State: AOAM531s2zZvFm3Kl6eaddQh1OaGJYD942sEAe6neJ9L0v9/HCnFVdvj ZzaxIIHVlV8m1uzaA0rgEHPuWP2wyck= X-Google-Smtp-Source: ABdhPJxSBouhvMElpX/azk5f6jCK4x6mVL1dEqQ49stVtSVJ9QVWYpQ0KsRxGetU23/OuxQHBPEbdg== X-Received: by 2002:a1c:ed0a:: with SMTP id l10mr4632605wmh.140.1631213273709; Thu, 09 Sep 2021 11:47:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id q7sm2439138wrc.55.2021.09.09.11.47.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:53 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:37 +0000 Subject: [PATCH v2 12/19] reftable: read reftable files Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This supports reading a single reftable file. The commit introduces an abstract iterator type, which captures the usecases both of reading individual refs, and iterating over a segment of the ref namespace. Signed-off-by: Han-Wen Nienhuys --- Makefile | 2 + reftable/iter.c | 194 +++++++++ reftable/iter.h | 69 ++++ reftable/reader.c | 801 +++++++++++++++++++++++++++++++++++++ reftable/reader.h | 66 +++ reftable/reftable-reader.h | 101 +++++ 6 files changed, 1233 insertions(+) create mode 100644 reftable/iter.c create mode 100644 reftable/iter.h create mode 100644 reftable/reader.c create mode 100644 reftable/reader.h create mode 100644 reftable/reftable-reader.h diff --git a/Makefile b/Makefile index 02cb06f82ec..c057c23d4a8 100644 --- a/Makefile +++ b/Makefile @@ -2460,7 +2460,9 @@ REFTABLE_OBJS += reftable/basics.o REFTABLE_OBJS += reftable/error.o REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o +REFTABLE_OBJS += reftable/iter.o REFTABLE_OBJS += reftable/publicbasics.o +REFTABLE_OBJS += reftable/reader.o REFTABLE_OBJS += reftable/record.o REFTABLE_OBJS += reftable/refname.o REFTABLE_OBJS += reftable/generic.o diff --git a/reftable/iter.c b/reftable/iter.c new file mode 100644 index 00000000000..93d04f735b8 --- /dev/null +++ b/reftable/iter.c @@ -0,0 +1,194 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "iter.h" + +#include "system.h" + +#include "block.h" +#include "generic.h" +#include "constants.h" +#include "reader.h" +#include "reftable-error.h" + +int iterator_is_null(struct reftable_iterator *it) +{ + return !it->ops; +} + +static void filtering_ref_iterator_close(void *iter_arg) +{ + struct filtering_ref_iterator *fri = iter_arg; + strbuf_release(&fri->oid); + reftable_iterator_destroy(&fri->it); +} + +static int filtering_ref_iterator_next(void *iter_arg, + struct reftable_record *rec) +{ + struct filtering_ref_iterator *fri = iter_arg; + struct reftable_ref_record *ref = rec->data; + int err = 0; + while (1) { + err = reftable_iterator_next_ref(&fri->it, ref); + if (err != 0) { + break; + } + + if (fri->double_check) { + struct reftable_iterator it = { NULL }; + + err = reftable_table_seek_ref(&fri->tab, &it, + ref->refname); + if (err == 0) { + err = reftable_iterator_next_ref(&it, ref); + } + + reftable_iterator_destroy(&it); + + if (err < 0) { + break; + } + + if (err > 0) { + continue; + } + } + + if (ref->value_type == REFTABLE_REF_VAL2 && + (!memcmp(fri->oid.buf, ref->value.val2.target_value, + fri->oid.len) || + !memcmp(fri->oid.buf, ref->value.val2.value, + fri->oid.len))) + return 0; + + if (ref->value_type == REFTABLE_REF_VAL1 && + !memcmp(fri->oid.buf, ref->value.val1, fri->oid.len)) { + return 0; + } + } + + reftable_ref_record_release(ref); + return err; +} + +static struct reftable_iterator_vtable filtering_ref_iterator_vtable = { + .next = &filtering_ref_iterator_next, + .close = &filtering_ref_iterator_close, +}; + +void iterator_from_filtering_ref_iterator(struct reftable_iterator *it, + struct filtering_ref_iterator *fri) +{ + assert(!it->ops); + it->iter_arg = fri; + it->ops = &filtering_ref_iterator_vtable; +} + +static void indexed_table_ref_iter_close(void *p) +{ + struct indexed_table_ref_iter *it = p; + block_iter_close(&it->cur); + reftable_block_done(&it->block_reader.block); + reftable_free(it->offsets); + strbuf_release(&it->oid); +} + +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it) +{ + uint64_t off; + int err = 0; + if (it->offset_idx == it->offset_len) { + it->is_finished = 1; + return 1; + } + + reftable_block_done(&it->block_reader.block); + + off = it->offsets[it->offset_idx++]; + err = reader_init_block_reader(it->r, &it->block_reader, off, + BLOCK_TYPE_REF); + if (err < 0) { + return err; + } + if (err > 0) { + /* indexed block does not exist. */ + return REFTABLE_FORMAT_ERROR; + } + block_reader_start(&it->block_reader, &it->cur); + return 0; +} + +static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec) +{ + struct indexed_table_ref_iter *it = p; + struct reftable_ref_record *ref = rec->data; + + while (1) { + int err = block_iter_next(&it->cur, rec); + if (err < 0) { + return err; + } + + if (err > 0) { + err = indexed_table_ref_iter_next_block(it); + if (err < 0) { + return err; + } + + if (it->is_finished) { + return 1; + } + continue; + } + /* BUG */ + if (!memcmp(it->oid.buf, ref->value.val2.target_value, + it->oid.len) || + !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) { + return 0; + } + } +} + +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest, + struct reftable_reader *r, uint8_t *oid, + int oid_len, uint64_t *offsets, int offset_len) +{ + struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT; + struct indexed_table_ref_iter *itr = + reftable_calloc(sizeof(struct indexed_table_ref_iter)); + int err = 0; + + *itr = empty; + itr->r = r; + strbuf_add(&itr->oid, oid, oid_len); + + itr->offsets = offsets; + itr->offset_len = offset_len; + + err = indexed_table_ref_iter_next_block(itr); + if (err < 0) { + reftable_free(itr); + } else { + *dest = itr; + } + return err; +} + +static struct reftable_iterator_vtable indexed_table_ref_iter_vtable = { + .next = &indexed_table_ref_iter_next, + .close = &indexed_table_ref_iter_close, +}; + +void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it, + struct indexed_table_ref_iter *itr) +{ + assert(!it->ops); + it->iter_arg = itr; + it->ops = &indexed_table_ref_iter_vtable; +} diff --git a/reftable/iter.h b/reftable/iter.h new file mode 100644 index 00000000000..09eb0cbfa59 --- /dev/null +++ b/reftable/iter.h @@ -0,0 +1,69 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef ITER_H +#define ITER_H + +#include "system.h" +#include "block.h" +#include "record.h" + +#include "reftable-iterator.h" +#include "reftable-generic.h" + +/* Returns true for a zeroed out iterator, such as the one returned from + * iterator_destroy. */ +int iterator_is_null(struct reftable_iterator *it); + +/* iterator that produces only ref records that point to `oid` */ +struct filtering_ref_iterator { + int double_check; + struct reftable_table tab; + struct strbuf oid; + struct reftable_iterator it; +}; +#define FILTERING_REF_ITERATOR_INIT \ + { \ + .oid = STRBUF_INIT \ + } + +void iterator_from_filtering_ref_iterator(struct reftable_iterator *, + struct filtering_ref_iterator *); + +/* iterator that produces only ref records that point to `oid`, + * but using the object index. + */ +struct indexed_table_ref_iter { + struct reftable_reader *r; + struct strbuf oid; + + /* mutable */ + uint64_t *offsets; + + /* Points to the next offset to read. */ + int offset_idx; + int offset_len; + struct block_reader block_reader; + struct block_iter cur; + int is_finished; +}; + +#define INDEXED_TABLE_REF_ITER_INIT \ + { \ + .cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \ + } + +void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it, + struct indexed_table_ref_iter *itr); + +/* Takes ownership of `offsets` */ +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest, + struct reftable_reader *r, uint8_t *oid, + int oid_len, uint64_t *offsets, int offset_len); + +#endif diff --git a/reftable/reader.c b/reftable/reader.c new file mode 100644 index 00000000000..ab7a981eb8e --- /dev/null +++ b/reftable/reader.c @@ -0,0 +1,801 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "reader.h" + +#include "system.h" +#include "block.h" +#include "constants.h" +#include "generic.h" +#include "iter.h" +#include "record.h" +#include "reftable-error.h" +#include "reftable-generic.h" +#include "tree.h" + +uint64_t block_source_size(struct reftable_block_source *source) +{ + return source->ops->size(source->arg); +} + +int block_source_read_block(struct reftable_block_source *source, + struct reftable_block *dest, uint64_t off, + uint32_t size) +{ + int result = source->ops->read_block(source->arg, dest, off, size); + dest->source = *source; + return result; +} + +void block_source_close(struct reftable_block_source *source) +{ + if (!source->ops) { + return; + } + + source->ops->close(source->arg); + source->ops = NULL; +} + +static struct reftable_reader_offsets * +reader_offsets_for(struct reftable_reader *r, uint8_t typ) +{ + switch (typ) { + case BLOCK_TYPE_REF: + return &r->ref_offsets; + case BLOCK_TYPE_LOG: + return &r->log_offsets; + case BLOCK_TYPE_OBJ: + return &r->obj_offsets; + } + abort(); +} + +static int reader_get_block(struct reftable_reader *r, + struct reftable_block *dest, uint64_t off, + uint32_t sz) +{ + if (off >= r->size) + return 0; + + if (off + sz > r->size) { + sz = r->size - off; + } + + return block_source_read_block(&r->source, dest, off, sz); +} + +uint32_t reftable_reader_hash_id(struct reftable_reader *r) +{ + return r->hash_id; +} + +const char *reader_name(struct reftable_reader *r) +{ + return r->name; +} + +static int parse_footer(struct reftable_reader *r, uint8_t *footer, + uint8_t *header) +{ + uint8_t *f = footer; + uint8_t first_block_typ; + int err = 0; + uint32_t computed_crc; + uint32_t file_crc; + + if (memcmp(f, "REFT", 4)) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + f += 4; + + if (memcmp(footer, header, header_size(r->version))) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + + f++; + r->block_size = get_be24(f); + + f += 3; + r->min_update_index = get_be64(f); + f += 8; + r->max_update_index = get_be64(f); + f += 8; + + if (r->version == 1) { + r->hash_id = GIT_SHA1_FORMAT_ID; + } else { + r->hash_id = get_be32(f); + switch (r->hash_id) { + case GIT_SHA1_FORMAT_ID: + break; + case GIT_SHA256_FORMAT_ID: + break; + default: + err = REFTABLE_FORMAT_ERROR; + goto done; + } + f += 4; + } + + r->ref_offsets.index_offset = get_be64(f); + f += 8; + + r->obj_offsets.offset = get_be64(f); + f += 8; + + r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1); + r->obj_offsets.offset >>= 5; + + r->obj_offsets.index_offset = get_be64(f); + f += 8; + r->log_offsets.offset = get_be64(f); + f += 8; + r->log_offsets.index_offset = get_be64(f); + f += 8; + + computed_crc = crc32(0, footer, f - footer); + file_crc = get_be32(f); + f += 4; + if (computed_crc != file_crc) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + + first_block_typ = header[header_size(r->version)]; + r->ref_offsets.is_present = (first_block_typ == BLOCK_TYPE_REF); + r->ref_offsets.offset = 0; + r->log_offsets.is_present = (first_block_typ == BLOCK_TYPE_LOG || + r->log_offsets.offset > 0); + r->obj_offsets.is_present = r->obj_offsets.offset > 0; + err = 0; +done: + return err; +} + +int init_reader(struct reftable_reader *r, struct reftable_block_source *source, + const char *name) +{ + struct reftable_block footer = { NULL }; + struct reftable_block header = { NULL }; + int err = 0; + uint64_t file_size = block_source_size(source); + + /* Need +1 to read type of first block. */ + uint32_t read_size = header_size(2) + 1; /* read v2 because it's larger. */ + memset(r, 0, sizeof(struct reftable_reader)); + + if (read_size > file_size) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + + err = block_source_read_block(source, &header, 0, read_size); + if (err != read_size) { + err = REFTABLE_IO_ERROR; + goto done; + } + + if (memcmp(header.data, "REFT", 4)) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + r->version = header.data[4]; + if (r->version != 1 && r->version != 2) { + err = REFTABLE_FORMAT_ERROR; + goto done; + } + + r->size = file_size - footer_size(r->version); + r->source = *source; + r->name = xstrdup(name); + r->hash_id = 0; + + err = block_source_read_block(source, &footer, r->size, + footer_size(r->version)); + if (err != footer_size(r->version)) { + err = REFTABLE_IO_ERROR; + goto done; + } + + err = parse_footer(r, footer.data, header.data); +done: + reftable_block_done(&footer); + reftable_block_done(&header); + return err; +} + +struct table_iter { + struct reftable_reader *r; + uint8_t typ; + uint64_t block_off; + struct block_iter bi; + int is_finished; +}; +#define TABLE_ITER_INIT \ + { \ + .bi = {.last_key = STRBUF_INIT } \ + } + +static void table_iter_copy_from(struct table_iter *dest, + struct table_iter *src) +{ + dest->r = src->r; + dest->typ = src->typ; + dest->block_off = src->block_off; + dest->is_finished = src->is_finished; + block_iter_copy_from(&dest->bi, &src->bi); +} + +static int table_iter_next_in_block(struct table_iter *ti, + struct reftable_record *rec) +{ + int res = block_iter_next(&ti->bi, rec); + if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) { + ((struct reftable_ref_record *)rec->data)->update_index += + ti->r->min_update_index; + } + + return res; +} + +static void table_iter_block_done(struct table_iter *ti) +{ + if (!ti->bi.br) { + return; + } + reftable_block_done(&ti->bi.br->block); + FREE_AND_NULL(ti->bi.br); + + ti->bi.last_key.len = 0; + ti->bi.next_off = 0; +} + +static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off, + int version) +{ + int32_t result = 0; + + if (off == 0) { + data += header_size(version); + } + + *typ = data[0]; + if (reftable_is_block_type(*typ)) { + result = get_be24(data + 1); + } + return result; +} + +int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br, + uint64_t next_off, uint8_t want_typ) +{ + int32_t guess_block_size = r->block_size ? r->block_size : + DEFAULT_BLOCK_SIZE; + struct reftable_block block = { NULL }; + uint8_t block_typ = 0; + int err = 0; + uint32_t header_off = next_off ? 0 : header_size(r->version); + int32_t block_size = 0; + + if (next_off >= r->size) + return 1; + + err = reader_get_block(r, &block, next_off, guess_block_size); + if (err < 0) + return err; + + block_size = extract_block_size(block.data, &block_typ, next_off, + r->version); + if (block_size < 0) + return block_size; + + if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) { + reftable_block_done(&block); + return 1; + } + + if (block_size > guess_block_size) { + reftable_block_done(&block); + err = reader_get_block(r, &block, next_off, block_size); + if (err < 0) { + return err; + } + } + + return block_reader_init(br, &block, header_off, r->block_size, + hash_size(r->hash_id)); +} + +static int table_iter_next_block(struct table_iter *dest, + struct table_iter *src) +{ + uint64_t next_block_off = src->block_off + src->bi.br->full_block_size; + struct block_reader br = { 0 }; + int err = 0; + + dest->r = src->r; + dest->typ = src->typ; + dest->block_off = next_block_off; + + err = reader_init_block_reader(src->r, &br, next_block_off, src->typ); + if (err > 0) { + dest->is_finished = 1; + return 1; + } + if (err != 0) + return err; + else { + struct block_reader *brp = + reftable_malloc(sizeof(struct block_reader)); + *brp = br; + + dest->is_finished = 0; + block_reader_start(brp, &dest->bi); + } + return 0; +} + +static int table_iter_next(struct table_iter *ti, struct reftable_record *rec) +{ + if (reftable_record_type(rec) != ti->typ) + return REFTABLE_API_ERROR; + + while (1) { + struct table_iter next = TABLE_ITER_INIT; + int err = 0; + if (ti->is_finished) { + return 1; + } + + err = table_iter_next_in_block(ti, rec); + if (err <= 0) { + return err; + } + + err = table_iter_next_block(&next, ti); + if (err != 0) { + ti->is_finished = 1; + } + table_iter_block_done(ti); + if (err != 0) { + return err; + } + table_iter_copy_from(ti, &next); + block_iter_close(&next.bi); + } +} + +static int table_iter_next_void(void *ti, struct reftable_record *rec) +{ + return table_iter_next(ti, rec); +} + +static void table_iter_close(void *p) +{ + struct table_iter *ti = p; + table_iter_block_done(ti); + block_iter_close(&ti->bi); +} + +static struct reftable_iterator_vtable table_iter_vtable = { + .next = &table_iter_next_void, + .close = &table_iter_close, +}; + +static void iterator_from_table_iter(struct reftable_iterator *it, + struct table_iter *ti) +{ + assert(!it->ops); + it->iter_arg = ti; + it->ops = &table_iter_vtable; +} + +static int reader_table_iter_at(struct reftable_reader *r, + struct table_iter *ti, uint64_t off, + uint8_t typ) +{ + struct block_reader br = { 0 }; + struct block_reader *brp = NULL; + + int err = reader_init_block_reader(r, &br, off, typ); + if (err != 0) + return err; + + brp = reftable_malloc(sizeof(struct block_reader)); + *brp = br; + ti->r = r; + ti->typ = block_reader_type(brp); + ti->block_off = off; + block_reader_start(brp, &ti->bi); + return 0; +} + +static int reader_start(struct reftable_reader *r, struct table_iter *ti, + uint8_t typ, int index) +{ + struct reftable_reader_offsets *offs = reader_offsets_for(r, typ); + uint64_t off = offs->offset; + if (index) { + off = offs->index_offset; + if (off == 0) { + return 1; + } + typ = BLOCK_TYPE_INDEX; + } + + return reader_table_iter_at(r, ti, off, typ); +} + +static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti, + struct reftable_record *want) +{ + struct reftable_record rec = + reftable_new_record(reftable_record_type(want)); + struct strbuf want_key = STRBUF_INIT; + struct strbuf got_key = STRBUF_INIT; + struct table_iter next = TABLE_ITER_INIT; + int err = -1; + + reftable_record_key(want, &want_key); + + while (1) { + err = table_iter_next_block(&next, ti); + if (err < 0) + goto done; + + if (err > 0) { + break; + } + + err = block_reader_first_key(next.bi.br, &got_key); + if (err < 0) + goto done; + + if (strbuf_cmp(&got_key, &want_key) > 0) { + table_iter_block_done(&next); + break; + } + + table_iter_block_done(ti); + table_iter_copy_from(ti, &next); + } + + err = block_iter_seek(&ti->bi, &want_key); + if (err < 0) + goto done; + err = 0; + +done: + block_iter_close(&next.bi); + reftable_record_destroy(&rec); + strbuf_release(&want_key); + strbuf_release(&got_key); + return err; +} + +static int reader_seek_indexed(struct reftable_reader *r, + struct reftable_iterator *it, + struct reftable_record *rec) +{ + struct reftable_index_record want_index = { .last_key = STRBUF_INIT }; + struct reftable_record want_index_rec = { NULL }; + struct reftable_index_record index_result = { .last_key = STRBUF_INIT }; + struct reftable_record index_result_rec = { NULL }; + struct table_iter index_iter = TABLE_ITER_INIT; + struct table_iter next = TABLE_ITER_INIT; + int err = 0; + + reftable_record_key(rec, &want_index.last_key); + reftable_record_from_index(&want_index_rec, &want_index); + reftable_record_from_index(&index_result_rec, &index_result); + + err = reader_start(r, &index_iter, reftable_record_type(rec), 1); + if (err < 0) + goto done; + + err = reader_seek_linear(r, &index_iter, &want_index_rec); + while (1) { + err = table_iter_next(&index_iter, &index_result_rec); + table_iter_block_done(&index_iter); + if (err != 0) + goto done; + + err = reader_table_iter_at(r, &next, index_result.offset, 0); + if (err != 0) + goto done; + + err = block_iter_seek(&next.bi, &want_index.last_key); + if (err < 0) + goto done; + + if (next.typ == reftable_record_type(rec)) { + err = 0; + break; + } + + if (next.typ != BLOCK_TYPE_INDEX) { + err = REFTABLE_FORMAT_ERROR; + break; + } + + table_iter_copy_from(&index_iter, &next); + } + + if (err == 0) { + struct table_iter empty = TABLE_ITER_INIT; + struct table_iter *malloced = + reftable_calloc(sizeof(struct table_iter)); + *malloced = empty; + table_iter_copy_from(malloced, &next); + iterator_from_table_iter(it, malloced); + } +done: + block_iter_close(&next.bi); + table_iter_close(&index_iter); + reftable_record_release(&want_index_rec); + reftable_record_release(&index_result_rec); + return err; +} + +static int reader_seek_internal(struct reftable_reader *r, + struct reftable_iterator *it, + struct reftable_record *rec) +{ + struct reftable_reader_offsets *offs = + reader_offsets_for(r, reftable_record_type(rec)); + uint64_t idx = offs->index_offset; + struct table_iter ti = TABLE_ITER_INIT; + int err = 0; + if (idx > 0) + return reader_seek_indexed(r, it, rec); + + err = reader_start(r, &ti, reftable_record_type(rec), 0); + if (err < 0) + return err; + err = reader_seek_linear(r, &ti, rec); + if (err < 0) + return err; + else { + struct table_iter *p = + reftable_malloc(sizeof(struct table_iter)); + *p = ti; + iterator_from_table_iter(it, p); + } + + return 0; +} + +int reader_seek(struct reftable_reader *r, struct reftable_iterator *it, + struct reftable_record *rec) +{ + uint8_t typ = reftable_record_type(rec); + + struct reftable_reader_offsets *offs = reader_offsets_for(r, typ); + if (!offs->is_present) { + iterator_set_empty(it); + return 0; + } + + return reader_seek_internal(r, it, rec); +} + +int reftable_reader_seek_ref(struct reftable_reader *r, + struct reftable_iterator *it, const char *name) +{ + struct reftable_ref_record ref = { + .refname = (char *)name, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, &ref); + return reader_seek(r, it, &rec); +} + +int reftable_reader_seek_log_at(struct reftable_reader *r, + struct reftable_iterator *it, const char *name, + uint64_t update_index) +{ + struct reftable_log_record log = { + .refname = (char *)name, + .update_index = update_index, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_log(&rec, &log); + return reader_seek(r, it, &rec); +} + +int reftable_reader_seek_log(struct reftable_reader *r, + struct reftable_iterator *it, const char *name) +{ + uint64_t max = ~((uint64_t)0); + return reftable_reader_seek_log_at(r, it, name, max); +} + +void reader_close(struct reftable_reader *r) +{ + block_source_close(&r->source); + FREE_AND_NULL(r->name); +} + +int reftable_new_reader(struct reftable_reader **p, + struct reftable_block_source *src, char const *name) +{ + struct reftable_reader *rd = + reftable_calloc(sizeof(struct reftable_reader)); + int err = init_reader(rd, src, name); + if (err == 0) { + *p = rd; + } else { + block_source_close(src); + reftable_free(rd); + } + return err; +} + +void reftable_reader_free(struct reftable_reader *r) +{ + reader_close(r); + reftable_free(r); +} + +static int reftable_reader_refs_for_indexed(struct reftable_reader *r, + struct reftable_iterator *it, + uint8_t *oid) +{ + struct reftable_obj_record want = { + .hash_prefix = oid, + .hash_prefix_len = r->object_id_len, + }; + struct reftable_record want_rec = { NULL }; + struct reftable_iterator oit = { NULL }; + struct reftable_obj_record got = { NULL }; + struct reftable_record got_rec = { NULL }; + int err = 0; + struct indexed_table_ref_iter *itr = NULL; + + /* Look through the reverse index. */ + reftable_record_from_obj(&want_rec, &want); + err = reader_seek(r, &oit, &want_rec); + if (err != 0) + goto done; + + /* read out the reftable_obj_record */ + reftable_record_from_obj(&got_rec, &got); + err = iterator_next(&oit, &got_rec); + if (err < 0) + goto done; + + if (err > 0 || + memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) { + /* didn't find it; return empty iterator */ + iterator_set_empty(it); + err = 0; + goto done; + } + + err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id), + got.offsets, got.offset_len); + if (err < 0) + goto done; + got.offsets = NULL; + iterator_from_indexed_table_ref_iter(it, itr); + +done: + reftable_iterator_destroy(&oit); + reftable_record_release(&got_rec); + return err; +} + +static int reftable_reader_refs_for_unindexed(struct reftable_reader *r, + struct reftable_iterator *it, + uint8_t *oid) +{ + struct table_iter ti_empty = TABLE_ITER_INIT; + struct table_iter *ti = reftable_calloc(sizeof(struct table_iter)); + struct filtering_ref_iterator *filter = NULL; + struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT; + int oid_len = hash_size(r->hash_id); + int err; + + *ti = ti_empty; + err = reader_start(r, ti, BLOCK_TYPE_REF, 0); + if (err < 0) { + reftable_free(ti); + return err; + } + + filter = reftable_malloc(sizeof(struct filtering_ref_iterator)); + *filter = empty; + + strbuf_add(&filter->oid, oid, oid_len); + reftable_table_from_reader(&filter->tab, r); + filter->double_check = 0; + iterator_from_table_iter(&filter->it, ti); + + iterator_from_filtering_ref_iterator(it, filter); + return 0; +} + +int reftable_reader_refs_for(struct reftable_reader *r, + struct reftable_iterator *it, uint8_t *oid) +{ + if (r->obj_offsets.is_present) + return reftable_reader_refs_for_indexed(r, it, oid); + return reftable_reader_refs_for_unindexed(r, it, oid); +} + +uint64_t reftable_reader_max_update_index(struct reftable_reader *r) +{ + return r->max_update_index; +} + +uint64_t reftable_reader_min_update_index(struct reftable_reader *r) +{ + return r->min_update_index; +} + +/* generic table interface. */ + +static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it, + struct reftable_record *rec) +{ + return reader_seek(tab, it, rec); +} + +static uint32_t reftable_reader_hash_id_void(void *tab) +{ + return reftable_reader_hash_id(tab); +} + +static uint64_t reftable_reader_min_update_index_void(void *tab) +{ + return reftable_reader_min_update_index(tab); +} + +static uint64_t reftable_reader_max_update_index_void(void *tab) +{ + return reftable_reader_max_update_index(tab); +} + +static struct reftable_table_vtable reader_vtable = { + .seek_record = reftable_reader_seek_void, + .hash_id = reftable_reader_hash_id_void, + .min_update_index = reftable_reader_min_update_index_void, + .max_update_index = reftable_reader_max_update_index_void, +}; + +void reftable_table_from_reader(struct reftable_table *tab, + struct reftable_reader *reader) +{ + assert(!tab->ops); + tab->ops = &reader_vtable; + tab->table_arg = reader; +} + + +int reftable_reader_print_file(const char *tablename) +{ + struct reftable_block_source src = { NULL }; + int err = reftable_block_source_from_file(&src, tablename); + struct reftable_reader *r = NULL; + struct reftable_table tab = { NULL }; + if (err < 0) + goto done; + + err = reftable_new_reader(&r, &src, tablename); + if (err < 0) + goto done; + + reftable_table_from_reader(&tab, r); + err = reftable_table_print(&tab); +done: + reftable_reader_free(r); + return err; +} diff --git a/reftable/reader.h b/reftable/reader.h new file mode 100644 index 00000000000..39583e5dbcd --- /dev/null +++ b/reftable/reader.h @@ -0,0 +1,66 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef READER_H +#define READER_H + +#include "block.h" +#include "record.h" +#include "reftable-iterator.h" +#include "reftable-reader.h" + +uint64_t block_source_size(struct reftable_block_source *source); + +int block_source_read_block(struct reftable_block_source *source, + struct reftable_block *dest, uint64_t off, + uint32_t size); +void block_source_close(struct reftable_block_source *source); + +/* metadata for a block type */ +struct reftable_reader_offsets { + int is_present; + uint64_t offset; + uint64_t index_offset; +}; + +/* The state for reading a reftable file. */ +struct reftable_reader { + /* for convience, associate a name with the instance. */ + char *name; + struct reftable_block_source source; + + /* Size of the file, excluding the footer. */ + uint64_t size; + + /* 'sha1' for SHA1, 's256' for SHA-256 */ + uint32_t hash_id; + + uint32_t block_size; + uint64_t min_update_index; + uint64_t max_update_index; + /* Length of the OID keys in the 'o' section */ + int object_id_len; + int version; + + struct reftable_reader_offsets ref_offsets; + struct reftable_reader_offsets obj_offsets; + struct reftable_reader_offsets log_offsets; +}; + +int init_reader(struct reftable_reader *r, struct reftable_block_source *source, + const char *name); +int reader_seek(struct reftable_reader *r, struct reftable_iterator *it, + struct reftable_record *rec); +void reader_close(struct reftable_reader *r); +const char *reader_name(struct reftable_reader *r); + +/* initialize a block reader to read from `r` */ +int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br, + uint64_t next_off, uint8_t want_typ); + +#endif diff --git a/reftable/reftable-reader.h b/reftable/reftable-reader.h new file mode 100644 index 00000000000..4a4bc2fdf85 --- /dev/null +++ b/reftable/reftable-reader.h @@ -0,0 +1,101 @@ +/* + Copyright 2020 Google LLC + + Use of this source code is governed by a BSD-style + license that can be found in the LICENSE file or at + https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_READER_H +#define REFTABLE_READER_H + +#include "reftable-iterator.h" +#include "reftable-blocksource.h" + +/* + * Reading single tables + * + * The follow routines are for reading single files. For an + * application-level interface, skip ahead to struct + * reftable_merged_table and struct reftable_stack. + */ + +/* The reader struct is a handle to an open reftable file. */ +struct reftable_reader; + +/* Generic table. */ +struct reftable_table; + +/* reftable_new_reader opens a reftable for reading. If successful, + * returns 0 code and sets pp. The name is used for creating a + * stack. Typically, it is the basename of the file. The block source + * `src` is owned by the reader, and is closed on calling + * reftable_reader_destroy(). On error, the block source `src` is + * closed as well. + */ +int reftable_new_reader(struct reftable_reader **pp, + struct reftable_block_source *src, const char *name); + +/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted + in the table. To seek to the start of the table, use name = "". + + example: + + struct reftable_reader *r = NULL; + int err = reftable_new_reader(&r, &src, "filename"); + if (err < 0) { ... } + struct reftable_iterator it = {0}; + err = reftable_reader_seek_ref(r, &it, "refs/heads/master"); + if (err < 0) { ... } + struct reftable_ref_record ref = {0}; + while (1) { + err = reftable_iterator_next_ref(&it, &ref); + if (err > 0) { + break; + } + if (err < 0) { + ..error handling.. + } + ..found.. + } + reftable_iterator_destroy(&it); + reftable_ref_record_release(&ref); +*/ +int reftable_reader_seek_ref(struct reftable_reader *r, + struct reftable_iterator *it, const char *name); + +/* returns the hash ID used in this table. */ +uint32_t reftable_reader_hash_id(struct reftable_reader *r); + +/* seek to logs for the given name, older than update_index. To seek to the + start of the table, use name = "". +*/ +int reftable_reader_seek_log_at(struct reftable_reader *r, + struct reftable_iterator *it, const char *name, + uint64_t update_index); + +/* seek to newest log entry for given name. */ +int reftable_reader_seek_log(struct reftable_reader *r, + struct reftable_iterator *it, const char *name); + +/* closes and deallocates a reader. */ +void reftable_reader_free(struct reftable_reader *); + +/* return an iterator for the refs pointing to `oid`. */ +int reftable_reader_refs_for(struct reftable_reader *r, + struct reftable_iterator *it, uint8_t *oid); + +/* return the max_update_index for a table */ +uint64_t reftable_reader_max_update_index(struct reftable_reader *r); + +/* return the min_update_index for a table */ +uint64_t reftable_reader_min_update_index(struct reftable_reader *r); + +/* creates a generic table from a file reader. */ +void reftable_table_from_reader(struct reftable_table *tab, + struct reftable_reader *reader); + +/* print table onto stdout for debugging. */ +int reftable_reader_print_file(const char *tablename); + +#endif From patchwork Thu Sep 9 18:47:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBB48C433F5 for ; Thu, 9 Sep 2021 18:49:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D69B361100 for ; Thu, 9 Sep 2021 18:49:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237317AbhIISuJ (ORCPT ); Thu, 9 Sep 2021 14:50:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245345AbhIIStL (ORCPT ); Thu, 9 Sep 2021 14:49:11 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D28E4C0613EF for ; Thu, 9 Sep 2021 11:47:55 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id n5so3929671wro.12 for ; Thu, 09 Sep 2021 11:47:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=Mi/s+/c8x0VA/Jsi/B1FqNdIfzWRHbuABM/O2YziG+I=; b=aKhNiEj4iPTzZ8n5WCGKo8BgeAWJ9gQjqOsugwqNzFTM8eC6jw5N+5I41R5WODdHMa 0elmZ8++kXMxS1T+4EZ/bCZVmwPSYxOskcB7XArXU0Rh7HCmBDzHIZvkEMdHbFj/3O5L eBX08JX5L3X2lCrg0HQ0H7GGvlawlJcNH5DXokgOsoJL7BFxLs3y7Y3zb0p/UefrYpD2 hUPv6UnENcXIU4JI5s2JNikiRBNT5OS+XErHfSm5Srh0f7msOXA+33XodXr32wmrOT41 IgoixkPEEqP8Hd9Gg8SMiEIQuX8XtNVz4wbTmJYo5xCbbpx3XRzndZ9JisoayiVzuzop VgKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=Mi/s+/c8x0VA/Jsi/B1FqNdIfzWRHbuABM/O2YziG+I=; b=KMTf7Zhb2GjG1a3EI4v0AruBnE6t7pq3CJFLYHVttNHcu9y4QYPe4TP08IaNxxCy5o 6jPu31aczpcVA2insUnWtwcsbtanrsvW6oycOJpFofAHaxeoWmDUR1gOItfcSi8PzjPW GkEfqt8O9fxF1rpLvK31oeifdXppbUHodbRi/EwKYCTj3EaQqdfJnaijbshgnbW0BaPj Ic5xDdLeP1PsX2nDDG+JsVXVEIvDdyS4TGKR3V7ByEQibrJ7nnjx3gGZ9P8BFc+U7YpD NLNP0T+lmuhOA6FTsOOdEz3pLSJ625N+7A94P5fZbb4YU8cr6UxmlOqqsObX7NVfapiP 4fAw== X-Gm-Message-State: AOAM531z43N8XYHwAr6HwzaNMEhb+MHFR9cu5tyKJkcAsLl69wbmOOWN Z1sILpvpOJCDpEF/LjO93rU1fOZLwwM= X-Google-Smtp-Source: ABdhPJyQPGy3edBZZ2fjyHPrXvmIbIPKFBaW5f0zp4gBju30uY2+rN0E2D2Pte4rr6Wi0U28uBGDHQ== X-Received: by 2002:a5d:4991:: with SMTP id r17mr5587460wrq.247.1631213274394; Thu, 09 Sep 2021 11:47:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 48sm479514wrc.14.2021.09.09.11.47.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:54 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:38 +0000 Subject: [PATCH v2 13/19] reftable: reftable file level tests Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys With support for reading and writing files in place, we can construct files (in memory) and attempt to read them back. Because some sections of the format are optional (eg. indices, log entries), we have to exercise this code using multiple sizes of input data Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + reftable/readwrite_test.c | 652 ++++++++++++++++++++++++++++++++++++++ reftable/reftable-tests.h | 2 +- t/helper/test-reftable.c | 1 + 4 files changed, 655 insertions(+), 1 deletion(-) create mode 100644 reftable/readwrite_test.c diff --git a/Makefile b/Makefile index c057c23d4a8..1c0bb449027 100644 --- a/Makefile +++ b/Makefile @@ -2473,6 +2473,7 @@ REFTABLE_OBJS += reftable/writer.o REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o REFTABLE_TEST_OBJS += reftable/record_test.o +REFTABLE_TEST_OBJS += reftable/readwrite_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o REFTABLE_TEST_OBJS += reftable/tree_test.o diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c new file mode 100644 index 00000000000..5f6bcc2f775 --- /dev/null +++ b/reftable/readwrite_test.c @@ -0,0 +1,652 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" + +#include "basics.h" +#include "block.h" +#include "blocksource.h" +#include "constants.h" +#include "reader.h" +#include "record.h" +#include "test_framework.h" +#include "reftable-tests.h" +#include "reftable-writer.h" + +static const int update_index = 5; + +static void test_buffer(void) +{ + struct strbuf buf = STRBUF_INIT; + struct reftable_block_source source = { NULL }; + struct reftable_block out = { NULL }; + int n; + uint8_t in[] = "hello"; + strbuf_add(&buf, in, sizeof(in)); + block_source_from_strbuf(&source, &buf); + EXPECT(block_source_size(&source) == 6); + n = block_source_read_block(&source, &out, 0, sizeof(in)); + EXPECT(n == sizeof(in)); + EXPECT(!memcmp(in, out.data, n)); + reftable_block_done(&out); + + n = block_source_read_block(&source, &out, 1, 2); + EXPECT(n == 2); + EXPECT(!memcmp(out.data, "el", 2)); + + reftable_block_done(&out); + block_source_close(&source); + strbuf_release(&buf); +} + +static void write_table(char ***names, struct strbuf *buf, int N, + int block_size, uint32_t hash_id) +{ + struct reftable_write_options opts = { + .block_size = block_size, + .hash_id = hash_id, + }; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, buf, &opts); + struct reftable_ref_record ref = { NULL }; + int i = 0, n; + struct reftable_log_record log = { NULL }; + const struct reftable_stats *stats = NULL; + *names = reftable_calloc(sizeof(char *) * (N + 1)); + reftable_writer_set_limits(w, update_index, update_index); + for (i = 0; i < N; i++) { + uint8_t hash[GIT_SHA256_RAWSZ] = { 0 }; + char name[100]; + int n; + + set_test_hash(hash, i); + + snprintf(name, sizeof(name), "refs/heads/branch%02d", i); + + ref.refname = name; + ref.update_index = update_index; + ref.value_type = REFTABLE_REF_VAL1; + ref.value.val1 = hash; + (*names)[i] = xstrdup(name); + + n = reftable_writer_add_ref(w, &ref); + EXPECT(n == 0); + } + + for (i = 0; i < N; i++) { + uint8_t hash[GIT_SHA256_RAWSZ] = { 0 }; + char name[100]; + int n; + + set_test_hash(hash, i); + + snprintf(name, sizeof(name), "refs/heads/branch%02d", i); + + log.refname = name; + log.update_index = update_index; + log.value_type = REFTABLE_LOG_UPDATE; + log.value.update.new_hash = hash; + log.value.update.message = "message"; + + n = reftable_writer_add_log(w, &log); + EXPECT(n == 0); + } + + n = reftable_writer_close(w); + EXPECT(n == 0); + + stats = writer_stats(w); + for (i = 0; i < stats->ref_stats.blocks; i++) { + int off = i * opts.block_size; + if (off == 0) { + off = header_size( + (hash_id == GIT_SHA256_FORMAT_ID) ? 2 : 1); + } + EXPECT(buf->buf[off] == 'r'); + } + + EXPECT(stats->log_stats.blocks > 0); + reftable_writer_free(w); +} + +static void test_log_buffer_size(void) +{ + struct strbuf buf = STRBUF_INIT; + struct reftable_write_options opts = { + .block_size = 4096, + }; + int err; + int i; + struct reftable_log_record + log = { .refname = "refs/heads/master", + .update_index = 0xa, + .value_type = REFTABLE_LOG_UPDATE, + .value = { .update = { + .name = "Han-Wen Nienhuys", + .email = "hanwen@google.com", + .tz_offset = 100, + .time = 0x5e430672, + .message = "commit: 9\n", + } } }; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + + /* This tests buffer extension for log compression. Must use a random + hash, to ensure that the compressed part is larger than the original. + */ + uint8_t hash1[GIT_SHA1_RAWSZ], hash2[GIT_SHA1_RAWSZ]; + for (i = 0; i < GIT_SHA1_RAWSZ; i++) { + hash1[i] = (uint8_t)(rand() % 256); + hash2[i] = (uint8_t)(rand() % 256); + } + log.value.update.old_hash = hash1; + log.value.update.new_hash = hash2; + reftable_writer_set_limits(w, update_index, update_index); + err = reftable_writer_add_log(w, &log); + EXPECT_ERR(err); + err = reftable_writer_close(w); + EXPECT_ERR(err); + reftable_writer_free(w); + strbuf_release(&buf); +} + +static void test_log_write_read(void) +{ + int N = 2; + char **names = reftable_calloc(sizeof(char *) * (N + 1)); + int err; + struct reftable_write_options opts = { + .block_size = 256, + }; + struct reftable_ref_record ref = { NULL }; + int i = 0; + struct reftable_log_record log = { NULL }; + int n; + struct reftable_iterator it = { NULL }; + struct reftable_reader rd = { NULL }; + struct reftable_block_source source = { NULL }; + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + const struct reftable_stats *stats = NULL; + reftable_writer_set_limits(w, 0, N); + for (i = 0; i < N; i++) { + char name[256]; + struct reftable_ref_record ref = { NULL }; + snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7); + names[i] = xstrdup(name); + ref.refname = name; + ref.update_index = i; + + err = reftable_writer_add_ref(w, &ref); + EXPECT_ERR(err); + } + for (i = 0; i < N; i++) { + uint8_t hash1[GIT_SHA1_RAWSZ], hash2[GIT_SHA1_RAWSZ]; + struct reftable_log_record log = { NULL }; + set_test_hash(hash1, i); + set_test_hash(hash2, i + 1); + + log.refname = names[i]; + log.update_index = i; + log.value_type = REFTABLE_LOG_UPDATE; + log.value.update.old_hash = hash1; + log.value.update.new_hash = hash2; + + err = reftable_writer_add_log(w, &log); + EXPECT_ERR(err); + } + + n = reftable_writer_close(w); + EXPECT(n == 0); + + stats = writer_stats(w); + EXPECT(stats->log_stats.blocks > 0); + reftable_writer_free(w); + w = NULL; + + block_source_from_strbuf(&source, &buf); + + err = init_reader(&rd, &source, "file.log"); + EXPECT_ERR(err); + + err = reftable_reader_seek_ref(&rd, &it, names[N - 1]); + EXPECT_ERR(err); + + err = reftable_iterator_next_ref(&it, &ref); + EXPECT_ERR(err); + + /* end of iteration. */ + err = reftable_iterator_next_ref(&it, &ref); + EXPECT(0 < err); + + reftable_iterator_destroy(&it); + reftable_ref_record_release(&ref); + + err = reftable_reader_seek_log(&rd, &it, ""); + EXPECT_ERR(err); + + i = 0; + while (1) { + int err = reftable_iterator_next_log(&it, &log); + if (err > 0) { + break; + } + + EXPECT_ERR(err); + EXPECT_STREQ(names[i], log.refname); + EXPECT(i == log.update_index); + i++; + reftable_log_record_release(&log); + } + + EXPECT(i == N); + reftable_iterator_destroy(&it); + + /* cleanup. */ + strbuf_release(&buf); + free_names(names); + reader_close(&rd); +} + +static void test_table_read_write_sequential(void) +{ + char **names; + struct strbuf buf = STRBUF_INIT; + int N = 50; + struct reftable_iterator it = { NULL }; + struct reftable_block_source source = { NULL }; + struct reftable_reader rd = { NULL }; + int err = 0; + int j = 0; + + write_table(&names, &buf, N, 256, GIT_SHA1_FORMAT_ID); + + block_source_from_strbuf(&source, &buf); + + err = init_reader(&rd, &source, "file.ref"); + EXPECT_ERR(err); + + err = reftable_reader_seek_ref(&rd, &it, ""); + EXPECT_ERR(err); + + while (1) { + struct reftable_ref_record ref = { NULL }; + int r = reftable_iterator_next_ref(&it, &ref); + EXPECT(r >= 0); + if (r > 0) { + break; + } + EXPECT(0 == strcmp(names[j], ref.refname)); + EXPECT(update_index == ref.update_index); + + j++; + reftable_ref_record_release(&ref); + } + EXPECT(j == N); + reftable_iterator_destroy(&it); + strbuf_release(&buf); + free_names(names); + + reader_close(&rd); +} + +static void test_table_write_small_table(void) +{ + char **names; + struct strbuf buf = STRBUF_INIT; + int N = 1; + write_table(&names, &buf, N, 4096, GIT_SHA1_FORMAT_ID); + EXPECT(buf.len < 200); + strbuf_release(&buf); + free_names(names); +} + +static void test_table_read_api(void) +{ + char **names; + struct strbuf buf = STRBUF_INIT; + int N = 50; + struct reftable_reader rd = { NULL }; + struct reftable_block_source source = { NULL }; + int err; + int i; + struct reftable_log_record log = { NULL }; + struct reftable_iterator it = { NULL }; + + write_table(&names, &buf, N, 256, GIT_SHA1_FORMAT_ID); + + block_source_from_strbuf(&source, &buf); + + err = init_reader(&rd, &source, "file.ref"); + EXPECT_ERR(err); + + err = reftable_reader_seek_ref(&rd, &it, names[0]); + EXPECT_ERR(err); + + err = reftable_iterator_next_log(&it, &log); + EXPECT(err == REFTABLE_API_ERROR); + + strbuf_release(&buf); + for (i = 0; i < N; i++) { + reftable_free(names[i]); + } + reftable_iterator_destroy(&it); + reftable_free(names); + reader_close(&rd); + strbuf_release(&buf); +} + +static void test_table_read_write_seek(int index, int hash_id) +{ + char **names; + struct strbuf buf = STRBUF_INIT; + int N = 50; + struct reftable_reader rd = { NULL }; + struct reftable_block_source source = { NULL }; + int err; + int i = 0; + + struct reftable_iterator it = { NULL }; + struct strbuf pastLast = STRBUF_INIT; + struct reftable_ref_record ref = { NULL }; + + write_table(&names, &buf, N, 256, hash_id); + + block_source_from_strbuf(&source, &buf); + + err = init_reader(&rd, &source, "file.ref"); + EXPECT_ERR(err); + EXPECT(hash_id == reftable_reader_hash_id(&rd)); + + if (!index) { + rd.ref_offsets.index_offset = 0; + } else { + EXPECT(rd.ref_offsets.index_offset > 0); + } + + for (i = 1; i < N; i++) { + int err = reftable_reader_seek_ref(&rd, &it, names[i]); + EXPECT_ERR(err); + err = reftable_iterator_next_ref(&it, &ref); + EXPECT_ERR(err); + EXPECT(0 == strcmp(names[i], ref.refname)); + EXPECT(REFTABLE_REF_VAL1 == ref.value_type); + EXPECT(i == ref.value.val1[0]); + + reftable_ref_record_release(&ref); + reftable_iterator_destroy(&it); + } + + strbuf_addstr(&pastLast, names[N - 1]); + strbuf_addstr(&pastLast, "/"); + + err = reftable_reader_seek_ref(&rd, &it, pastLast.buf); + if (err == 0) { + struct reftable_ref_record ref = { NULL }; + int err = reftable_iterator_next_ref(&it, &ref); + EXPECT(err > 0); + } else { + EXPECT(err > 0); + } + + strbuf_release(&pastLast); + reftable_iterator_destroy(&it); + + strbuf_release(&buf); + for (i = 0; i < N; i++) { + reftable_free(names[i]); + } + reftable_free(names); + reader_close(&rd); +} + +static void test_table_read_write_seek_linear(void) +{ + test_table_read_write_seek(0, GIT_SHA1_FORMAT_ID); +} + +static void test_table_read_write_seek_linear_sha256(void) +{ + test_table_read_write_seek(0, GIT_SHA256_FORMAT_ID); +} + +static void test_table_read_write_seek_index(void) +{ + test_table_read_write_seek(1, GIT_SHA1_FORMAT_ID); +} + +static void test_table_refs_for(int indexed) +{ + int N = 50; + char **want_names = reftable_calloc(sizeof(char *) * (N + 1)); + int want_names_len = 0; + uint8_t want_hash[GIT_SHA1_RAWSZ]; + + struct reftable_write_options opts = { + .block_size = 256, + }; + struct reftable_ref_record ref = { NULL }; + int i = 0; + int n; + int err; + struct reftable_reader rd; + struct reftable_block_source source = { NULL }; + + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + + struct reftable_iterator it = { NULL }; + int j; + + set_test_hash(want_hash, 4); + + for (i = 0; i < N; i++) { + uint8_t hash[GIT_SHA1_RAWSZ]; + char fill[51] = { 0 }; + char name[100]; + uint8_t hash1[GIT_SHA1_RAWSZ]; + uint8_t hash2[GIT_SHA1_RAWSZ]; + struct reftable_ref_record ref = { NULL }; + + memset(hash, i, sizeof(hash)); + memset(fill, 'x', 50); + /* Put the variable part in the start */ + snprintf(name, sizeof(name), "br%02d%s", i, fill); + name[40] = 0; + ref.refname = name; + + set_test_hash(hash1, i / 4); + set_test_hash(hash2, 3 + i / 4); + ref.value_type = REFTABLE_REF_VAL2; + ref.value.val2.value = hash1; + ref.value.val2.target_value = hash2; + + /* 80 bytes / entry, so 3 entries per block. Yields 17 + */ + /* blocks. */ + n = reftable_writer_add_ref(w, &ref); + EXPECT(n == 0); + + if (!memcmp(hash1, want_hash, GIT_SHA1_RAWSZ) || + !memcmp(hash2, want_hash, GIT_SHA1_RAWSZ)) { + want_names[want_names_len++] = xstrdup(name); + } + } + + n = reftable_writer_close(w); + EXPECT(n == 0); + + reftable_writer_free(w); + w = NULL; + + block_source_from_strbuf(&source, &buf); + + err = init_reader(&rd, &source, "file.ref"); + EXPECT_ERR(err); + if (!indexed) { + rd.obj_offsets.is_present = 0; + } + + err = reftable_reader_seek_ref(&rd, &it, ""); + EXPECT_ERR(err); + reftable_iterator_destroy(&it); + + err = reftable_reader_refs_for(&rd, &it, want_hash); + EXPECT_ERR(err); + + j = 0; + while (1) { + int err = reftable_iterator_next_ref(&it, &ref); + EXPECT(err >= 0); + if (err > 0) { + break; + } + + EXPECT(j < want_names_len); + EXPECT(0 == strcmp(ref.refname, want_names[j])); + j++; + reftable_ref_record_release(&ref); + } + EXPECT(j == want_names_len); + + strbuf_release(&buf); + free_names(want_names); + reftable_iterator_destroy(&it); + reader_close(&rd); +} + +static void test_table_refs_for_no_index(void) +{ + test_table_refs_for(0); +} + +static void test_table_refs_for_obj_index(void) +{ + test_table_refs_for(1); +} + +static void test_write_empty_table(void) +{ + struct reftable_write_options opts = { 0 }; + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + struct reftable_block_source source = { NULL }; + struct reftable_reader *rd = NULL; + struct reftable_ref_record rec = { NULL }; + struct reftable_iterator it = { NULL }; + int err; + + reftable_writer_set_limits(w, 1, 1); + + err = reftable_writer_close(w); + EXPECT(err == REFTABLE_EMPTY_TABLE_ERROR); + reftable_writer_free(w); + + EXPECT(buf.len == header_size(1) + footer_size(1)); + + block_source_from_strbuf(&source, &buf); + + err = reftable_new_reader(&rd, &source, "filename"); + EXPECT_ERR(err); + + err = reftable_reader_seek_ref(rd, &it, ""); + EXPECT_ERR(err); + + err = reftable_iterator_next_ref(&it, &rec); + EXPECT(err > 0); + + reftable_iterator_destroy(&it); + reftable_reader_free(rd); + strbuf_release(&buf); +} + +static void test_write_key_order(void) +{ + struct reftable_write_options opts = { 0 }; + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + struct reftable_ref_record refs[2] = { + { + .refname = "b", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value = { + .symref = "target", + }, + }, { + .refname = "a", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value = { + .symref = "target", + }, + } + }; + int err; + + reftable_writer_set_limits(w, 1, 1); + err = reftable_writer_add_ref(w, &refs[0]); + EXPECT_ERR(err); + err = reftable_writer_add_ref(w, &refs[1]); + printf("%d\n", err); + EXPECT(err == REFTABLE_API_ERROR); + reftable_writer_close(w); + reftable_writer_free(w); + strbuf_release(&buf); +} + +static void test_corrupt_table_empty(void) +{ + struct strbuf buf = STRBUF_INIT; + struct reftable_block_source source = { NULL }; + struct reftable_reader rd = { NULL }; + int err; + + block_source_from_strbuf(&source, &buf); + err = init_reader(&rd, &source, "file.log"); + EXPECT(err == REFTABLE_FORMAT_ERROR); +} + +static void test_corrupt_table(void) +{ + uint8_t zeros[1024] = { 0 }; + struct strbuf buf = STRBUF_INIT; + struct reftable_block_source source = { NULL }; + struct reftable_reader rd = { NULL }; + int err; + strbuf_add(&buf, zeros, sizeof(zeros)); + + block_source_from_strbuf(&source, &buf); + err = init_reader(&rd, &source, "file.log"); + EXPECT(err == REFTABLE_FORMAT_ERROR); + strbuf_release(&buf); +} + +int readwrite_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_corrupt_table); + RUN_TEST(test_corrupt_table_empty); + RUN_TEST(test_log_write_read); + RUN_TEST(test_write_key_order); + RUN_TEST(test_table_read_write_seek_linear_sha256); + RUN_TEST(test_log_buffer_size); + RUN_TEST(test_table_write_small_table); + RUN_TEST(test_buffer); + RUN_TEST(test_table_read_api); + RUN_TEST(test_table_read_write_sequential); + RUN_TEST(test_table_read_write_seek_linear); + RUN_TEST(test_table_read_write_seek_index); + RUN_TEST(test_table_refs_for_no_index); + RUN_TEST(test_table_refs_for_obj_index); + RUN_TEST(test_write_empty_table); + return 0; +} diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h index 5e7698ae654..3d541fa5c0c 100644 --- a/reftable/reftable-tests.h +++ b/reftable/reftable-tests.h @@ -14,7 +14,7 @@ int block_test_main(int argc, const char **argv); int merged_test_main(int argc, const char **argv); int record_test_main(int argc, const char **argv); int refname_test_main(int argc, const char **argv); -int reftable_test_main(int argc, const char **argv); +int readwrite_test_main(int argc, const char **argv); int stack_test_main(int argc, const char **argv); int tree_test_main(int argc, const char **argv); int reftable_dump_main(int argc, char *const *argv); diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 050551fa698..898aba836fd 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -6,6 +6,7 @@ int cmd__reftable(int argc, const char **argv) basics_test_main(argc, argv); block_test_main(argc, argv); record_test_main(argc, argv); + readwrite_test_main(argc, argv); tree_test_main(argc, argv); return 0; } From patchwork Thu Sep 9 18:47:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483681 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62380C433EF for ; Thu, 9 Sep 2021 18:48:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4D9AB61100 for ; Thu, 9 Sep 2021 18:48:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343631AbhIISt6 (ORCPT ); Thu, 9 Sep 2021 14:49:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245352AbhIIStL (ORCPT ); Thu, 9 Sep 2021 14:49:11 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5965DC0613F0 for ; Thu, 9 Sep 2021 11:47:56 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id n5so3929683wro.12 for ; Thu, 09 Sep 2021 11:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NqfiJsyECBc1PhDMJ1UiCYytYgX6tG6TloGUqDwSNms=; b=OLxcQcWTxWND7xCoHVCRdv1GoF6/oXIcoJQTylb2gHVoE91HOsOPTkmlHM9fMn8U4L LbCYuTEjkJNbqlU0rV1BKes2Hb1QWJygdCjhWCNq9Kv04NSiLLYnqSvKvDJM5YQx6u9+ 4/1sfcQ28/4qKy+sCYvNp9M8n4iXAkibq+bfeIRdB+V23cWTXzYSO39V7mi7aIRBc4JA xhkT6tPUbXkaISOreXaRdRYyqywI07mx5Dq6IMSkUaurKUrslB+cc5up3Pc60PRi558n wB7SE7C7GfHIPtDi/rbW0yLBjcgzkOaOLugTdrdOYxJ1tDJ0L2ITLZOWSMe7LsFlKZ/j NF9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NqfiJsyECBc1PhDMJ1UiCYytYgX6tG6TloGUqDwSNms=; b=7qlGEAtnVOg1EzE+x/71VppWppzSawgt9LX9ScGmIm16ldvCallxqaABeLOx+sBSQY /Eg9AA/wGRPh+NSO/aBJMzJ8FQ7lEQJo3UsyDA3rClgijSesle6eWksAGkv3O6T9yEe1 UvxAJ2WsRCOHs3WXozJgj6Kh408uyxmWWSB9Md63eponOkUuhs1GVR1FyhV9wczHBMBH A/7EJ4uKcSJw4JIHuwmiqx9eBWDeufH57LYbq1a3r2KcfzLya2xVCgIdkE9RtiUFc74H Fdfg0fNYMvjT4uEAcnW8ghsxDbCWmlR6X7DsJWdEHeTeJ41kf9nAfo/+Kk5UKrTaQZwv /DyQ== X-Gm-Message-State: AOAM530n2RScmC8iVH1jEWd0qjHLp0TAuPGOWa16gYN+kU/drUrHUkOF R2CrjEEk4XQcHhoJocHu5u875Rzt65w= X-Google-Smtp-Source: ABdhPJxduJzMebf/ne3vdH1CRLkl5pSMj0nTaroQgVysYF6s/gBBb933riSdCSnvHVzNNer/E99kYw== X-Received: by 2002:a5d:528b:: with SMTP id c11mr5250556wrv.369.1631213274933; Thu, 09 Sep 2021 11:47:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l16sm2816960wrh.44.2021.09.09.11.47.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:54 -0700 (PDT) Message-Id: <5a5c108d601ef6904f74597889deb09f8009c695.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:39 +0000 Subject: [PATCH v2 14/19] reftable: add a heap-based priority queue for reftable records Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This is needed to create a merged view multiple reftables Signed-off-by: Han-Wen Nienhuys --- Makefile | 2 + reftable/pq.c | 105 ++++++++++++++++++++++++++++++++++++++ reftable/pq.h | 33 ++++++++++++ reftable/pq_test.c | 82 +++++++++++++++++++++++++++++ reftable/reftable-tests.h | 1 + t/helper/test-reftable.c | 1 + 6 files changed, 224 insertions(+) create mode 100644 reftable/pq.c create mode 100644 reftable/pq.h create mode 100644 reftable/pq_test.c diff --git a/Makefile b/Makefile index 1c0bb449027..d2c9b0097a7 100644 --- a/Makefile +++ b/Makefile @@ -2462,6 +2462,7 @@ REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/iter.o REFTABLE_OBJS += reftable/publicbasics.o +REFTABLE_OBJS += reftable/pq.o REFTABLE_OBJS += reftable/reader.o REFTABLE_OBJS += reftable/record.o REFTABLE_OBJS += reftable/refname.o @@ -2472,6 +2473,7 @@ REFTABLE_OBJS += reftable/writer.o REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o +REFTABLE_TEST_OBJS += reftable/pq_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/readwrite_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o diff --git a/reftable/pq.c b/reftable/pq.c new file mode 100644 index 00000000000..efc474017a2 --- /dev/null +++ b/reftable/pq.c @@ -0,0 +1,105 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "pq.h" + +#include "reftable-record.h" +#include "system.h" +#include "basics.h" + +int pq_less(struct pq_entry *a, struct pq_entry *b) +{ + struct strbuf ak = STRBUF_INIT; + struct strbuf bk = STRBUF_INIT; + int cmp = 0; + reftable_record_key(&a->rec, &ak); + reftable_record_key(&b->rec, &bk); + + cmp = strbuf_cmp(&ak, &bk); + + strbuf_release(&ak); + strbuf_release(&bk); + + if (cmp == 0) + return a->index > b->index; + + return cmp < 0; +} + +struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq) +{ + return pq.heap[0]; +} + +int merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq) +{ + return pq.len == 0; +} + +struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq) +{ + int i = 0; + struct pq_entry e = pq->heap[0]; + pq->heap[0] = pq->heap[pq->len - 1]; + pq->len--; + + i = 0; + while (i < pq->len) { + int min = i; + int j = 2 * i + 1; + int k = 2 * i + 2; + if (j < pq->len && pq_less(&pq->heap[j], &pq->heap[i])) { + min = j; + } + if (k < pq->len && pq_less(&pq->heap[k], &pq->heap[min])) { + min = k; + } + + if (min == i) { + break; + } + + SWAP(pq->heap[i], pq->heap[min]); + i = min; + } + + return e; +} + +void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e) +{ + int i = 0; + if (pq->len == pq->cap) { + pq->cap = 2 * pq->cap + 1; + pq->heap = reftable_realloc(pq->heap, + pq->cap * sizeof(struct pq_entry)); + } + + pq->heap[pq->len++] = e; + i = pq->len - 1; + while (i > 0) { + int j = (i - 1) / 2; + if (pq_less(&pq->heap[j], &pq->heap[i])) { + break; + } + + SWAP(pq->heap[j], pq->heap[i]); + + i = j; + } +} + +void merged_iter_pqueue_release(struct merged_iter_pqueue *pq) +{ + int i = 0; + for (i = 0; i < pq->len; i++) { + reftable_record_destroy(&pq->heap[i].rec); + } + FREE_AND_NULL(pq->heap); + pq->len = pq->cap = 0; +} diff --git a/reftable/pq.h b/reftable/pq.h new file mode 100644 index 00000000000..56fc1b6d873 --- /dev/null +++ b/reftable/pq.h @@ -0,0 +1,33 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef PQ_H +#define PQ_H + +#include "record.h" + +struct pq_entry { + int index; + struct reftable_record rec; +}; + +struct merged_iter_pqueue { + struct pq_entry *heap; + size_t len; + size_t cap; +}; + +struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq); +int merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq); +void merged_iter_pqueue_check(struct merged_iter_pqueue pq); +struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq); +void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e); +void merged_iter_pqueue_release(struct merged_iter_pqueue *pq); +int pq_less(struct pq_entry *a, struct pq_entry *b); + +#endif diff --git a/reftable/pq_test.c b/reftable/pq_test.c new file mode 100644 index 00000000000..aaa86d12247 --- /dev/null +++ b/reftable/pq_test.c @@ -0,0 +1,82 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" + +#include "basics.h" +#include "constants.h" +#include "pq.h" +#include "record.h" +#include "reftable-tests.h" +#include "test_framework.h" + +void merged_iter_pqueue_check(struct merged_iter_pqueue pq) +{ + int i = 0; + for (i = 1; i < pq.len; i++) { + int parent = (i - 1) / 2; + + assert(pq_less(&pq.heap[parent], &pq.heap[i])); + } +} + +static void test_pq(void) +{ + char *names[54] = { NULL }; + int N = ARRAY_SIZE(names) - 1; + + struct merged_iter_pqueue pq = { NULL }; + const char *last = NULL; + + int i = 0; + for (i = 0; i < N; i++) { + char name[100]; + snprintf(name, sizeof(name), "%02d", i); + names[i] = xstrdup(name); + } + + i = 1; + do { + struct reftable_record rec = + reftable_new_record(BLOCK_TYPE_REF); + struct pq_entry e = { 0 }; + + reftable_record_as_ref(&rec)->refname = names[i]; + e.rec = rec; + merged_iter_pqueue_add(&pq, e); + merged_iter_pqueue_check(pq); + i = (i * 7) % N; + } while (i != 1); + + while (!merged_iter_pqueue_is_empty(pq)) { + struct pq_entry e = merged_iter_pqueue_remove(&pq); + struct reftable_ref_record *ref = + reftable_record_as_ref(&e.rec); + + merged_iter_pqueue_check(pq); + + if (last) { + EXPECT(strcmp(last, ref->refname) < 0); + } + last = ref->refname; + ref->refname = NULL; + reftable_free(ref); + } + + for (i = 0; i < N; i++) { + reftable_free(names[i]); + } + + merged_iter_pqueue_release(&pq); +} + +int pq_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_pq); + return 0; +} diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h index 3d541fa5c0c..0019cbcfa49 100644 --- a/reftable/reftable-tests.h +++ b/reftable/reftable-tests.h @@ -12,6 +12,7 @@ https://developers.google.com/open-source/licenses/bsd int basics_test_main(int argc, const char **argv); int block_test_main(int argc, const char **argv); int merged_test_main(int argc, const char **argv); +int pq_test_main(int argc, const char **argv); int record_test_main(int argc, const char **argv); int refname_test_main(int argc, const char **argv); int readwrite_test_main(int argc, const char **argv); diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 898aba836fd..0b5a1701df1 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -5,6 +5,7 @@ int cmd__reftable(int argc, const char **argv) { basics_test_main(argc, argv); block_test_main(argc, argv); + pq_test_main(argc, argv); record_test_main(argc, argv); readwrite_test_main(argc, argv); tree_test_main(argc, argv); From patchwork Thu Sep 9 18:47:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F81FC433EF for ; Thu, 9 Sep 2021 18:49:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3620610FF for ; Thu, 9 Sep 2021 18:49:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343890AbhIISud (ORCPT ); Thu, 9 Sep 2021 14:50:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245548AbhIIStO (ORCPT ); Thu, 9 Sep 2021 14:49:14 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A8D8C061793 for ; Thu, 9 Sep 2021 11:47:57 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id v10so3964114wrd.4 for ; Thu, 09 Sep 2021 11:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=0/Vjg5v3P17c/IL6O7TPWcCD3rMJthZoqILA+ZQv7OQ=; b=NIwWNeFoVHNKrK2B4L1rYnOaqXCld93IsceA2wDEA4VW+wB0dkq+qQsnKWy6TTxRSb uHxApY3CETKdZLf6Im7VckVtHeEmwTWoZMPyTSl6hWrVB9D10QBlhFYlPWCkuIIrxMLc 2pwSvQ8d/FHoq+RZXLaMaJi6fZYlPmEYEv/uQ9nTOlvpgIbrjCxFWICUkbJHEe7Z98/w +6Op6Ch0nzCzCyKAHeOuNj/CBpxOKomwZDY/WUnmWXcDVhSWqAGpHEL0M6jwZhR0f+t+ Ph9PavDn9RRSVLfN4DNtIEJPOHsr7//oXUmMjNdqW4smnMEdSdCo1Qek4ktJdYrvxvWI Mh8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=0/Vjg5v3P17c/IL6O7TPWcCD3rMJthZoqILA+ZQv7OQ=; b=oQUGqLBDOvCLfVF6p83UZsc9vUY9BJ3w6UOASuZfJSoFT98toX4R6Zv8l/clhEcMYi kYAFChAhrOFQLRyq6PE5w5YqtenLZH+rtYd8AHdkdZ8SX3FsyVfutfjxmj06qosGwAzk F2QazzY9EXrJDJIJaaIjwtGNl8KWWtAPCH5aE7kU64k4QLvlPEVtlum/YvlpFXSm3rqC ZuWM5A941pp1UE4QVw6ck4vcTGt64HiTfNapNyYCfITdLwxiJ1oJAowDKHlt5ar9iAth /QMMr4llJcxUCEBXYLlEcrGe8EoQ7RnQjee3oDAvm25SG2nSrrQYyCxgBdntuWDSGKtt jCUw== X-Gm-Message-State: AOAM532J2YiZaqLCICjETAuQBbVTLghHG8Cajzotc85iURBWwV0Ikxj7 Yk4v4Iy0B/gYWw2X3/Wmvhdeqbat9Us= X-Google-Smtp-Source: ABdhPJycTS6PgAwmjOnwX3JxOMHYAiWLCc5k4xdPo+wIIPsdyh+mMT8HIhokM1w7MlsYdnXZDpaosw== X-Received: by 2002:adf:fd92:: with SMTP id d18mr5457102wrr.28.1631213275552; Thu, 09 Sep 2021 11:47:55 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y15sm2553414wmi.18.2021.09.09.11.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:55 -0700 (PDT) Message-Id: <905ffa58f362c440ccbaf88583420c69f64d0c18.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:40 +0000 Subject: [PATCH v2 15/19] reftable: add merged table view Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This adds an abstract, read-only interface to the ref database. This primitive is used to construct the read view of the ref database (the read view is constructed by merging several *.ref files). It also provides the mechanism to provide a unified view of the refs in the main repository and the per-worktree refs. Signed-off-by: Han-Wen Nienhuys --- Makefile | 2 + reftable/merged.c | 362 +++++++++++++++++++++++++++++++++++++ reftable/merged.h | 35 ++++ reftable/merged_test.c | 292 ++++++++++++++++++++++++++++++ reftable/reftable-merged.h | 72 ++++++++ t/helper/test-reftable.c | 1 + 6 files changed, 764 insertions(+) create mode 100644 reftable/merged.c create mode 100644 reftable/merged.h create mode 100644 reftable/merged_test.c create mode 100644 reftable/reftable-merged.h diff --git a/Makefile b/Makefile index d2c9b0097a7..27f29779c9a 100644 --- a/Makefile +++ b/Makefile @@ -2462,6 +2462,7 @@ REFTABLE_OBJS += reftable/block.o REFTABLE_OBJS += reftable/blocksource.o REFTABLE_OBJS += reftable/iter.o REFTABLE_OBJS += reftable/publicbasics.o +REFTABLE_OBJS += reftable/merged.o REFTABLE_OBJS += reftable/pq.o REFTABLE_OBJS += reftable/reader.o REFTABLE_OBJS += reftable/record.o @@ -2473,6 +2474,7 @@ REFTABLE_OBJS += reftable/writer.o REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o +REFTABLE_TEST_OBJS += reftable/merged_test.o REFTABLE_TEST_OBJS += reftable/pq_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/readwrite_test.o diff --git a/reftable/merged.c b/reftable/merged.c new file mode 100644 index 00000000000..e5b53da6db3 --- /dev/null +++ b/reftable/merged.c @@ -0,0 +1,362 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "merged.h" + +#include "constants.h" +#include "iter.h" +#include "pq.h" +#include "reader.h" +#include "record.h" +#include "generic.h" +#include "reftable-merged.h" +#include "reftable-error.h" +#include "system.h" + +static int merged_iter_init(struct merged_iter *mi) +{ + int i = 0; + for (i = 0; i < mi->stack_len; i++) { + struct reftable_record rec = reftable_new_record(mi->typ); + int err = iterator_next(&mi->stack[i], &rec); + if (err < 0) { + return err; + } + + if (err > 0) { + reftable_iterator_destroy(&mi->stack[i]); + reftable_record_destroy(&rec); + } else { + struct pq_entry e = { + .rec = rec, + .index = i, + }; + merged_iter_pqueue_add(&mi->pq, e); + } + } + + return 0; +} + +static void merged_iter_close(void *p) +{ + struct merged_iter *mi = p; + int i = 0; + merged_iter_pqueue_release(&mi->pq); + for (i = 0; i < mi->stack_len; i++) { + reftable_iterator_destroy(&mi->stack[i]); + } + reftable_free(mi->stack); +} + +static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi, + size_t idx) +{ + struct reftable_record rec = reftable_new_record(mi->typ); + struct pq_entry e = { + .rec = rec, + .index = idx, + }; + int err = iterator_next(&mi->stack[idx], &rec); + if (err < 0) + return err; + + if (err > 0) { + reftable_iterator_destroy(&mi->stack[idx]); + reftable_record_destroy(&rec); + return 0; + } + + merged_iter_pqueue_add(&mi->pq, e); + return 0; +} + +static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx) +{ + if (iterator_is_null(&mi->stack[idx])) + return 0; + return merged_iter_advance_nonnull_subiter(mi, idx); +} + +static int merged_iter_next_entry(struct merged_iter *mi, + struct reftable_record *rec) +{ + struct strbuf entry_key = STRBUF_INIT; + struct pq_entry entry = { 0 }; + int err = 0; + + if (merged_iter_pqueue_is_empty(mi->pq)) + return 1; + + entry = merged_iter_pqueue_remove(&mi->pq); + err = merged_iter_advance_subiter(mi, entry.index); + if (err < 0) + return err; + + /* + One can also use reftable as datacenter-local storage, where the ref + database is maintained in globally consistent database (eg. + CockroachDB or Spanner). In this scenario, replication delays together + with compaction may cause newer tables to contain older entries. In + such a deployment, the loop below must be changed to collect all + entries for the same key, and return new the newest one. + */ + reftable_record_key(&entry.rec, &entry_key); + while (!merged_iter_pqueue_is_empty(mi->pq)) { + struct pq_entry top = merged_iter_pqueue_top(mi->pq); + struct strbuf k = STRBUF_INIT; + int err = 0, cmp = 0; + + reftable_record_key(&top.rec, &k); + + cmp = strbuf_cmp(&k, &entry_key); + strbuf_release(&k); + + if (cmp > 0) { + break; + } + + merged_iter_pqueue_remove(&mi->pq); + err = merged_iter_advance_subiter(mi, top.index); + if (err < 0) { + return err; + } + reftable_record_destroy(&top.rec); + } + + reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id)); + reftable_record_destroy(&entry.rec); + strbuf_release(&entry_key); + return 0; +} + +static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec) +{ + while (1) { + int err = merged_iter_next_entry(mi, rec); + if (err == 0 && mi->suppress_deletions && + reftable_record_is_deletion(rec)) { + continue; + } + + return err; + } +} + +static int merged_iter_next_void(void *p, struct reftable_record *rec) +{ + struct merged_iter *mi = p; + if (merged_iter_pqueue_is_empty(mi->pq)) + return 1; + + return merged_iter_next(mi, rec); +} + +static struct reftable_iterator_vtable merged_iter_vtable = { + .next = &merged_iter_next_void, + .close = &merged_iter_close, +}; + +static void iterator_from_merged_iter(struct reftable_iterator *it, + struct merged_iter *mi) +{ + assert(!it->ops); + it->iter_arg = mi; + it->ops = &merged_iter_vtable; +} + +int reftable_new_merged_table(struct reftable_merged_table **dest, + struct reftable_table *stack, int n, + uint32_t hash_id) +{ + struct reftable_merged_table *m = NULL; + uint64_t last_max = 0; + uint64_t first_min = 0; + int i = 0; + for (i = 0; i < n; i++) { + uint64_t min = reftable_table_min_update_index(&stack[i]); + uint64_t max = reftable_table_max_update_index(&stack[i]); + + if (reftable_table_hash_id(&stack[i]) != hash_id) { + return REFTABLE_FORMAT_ERROR; + } + if (i == 0 || min < first_min) { + first_min = min; + } + if (i == 0 || max > last_max) { + last_max = max; + } + } + + m = reftable_calloc(sizeof(struct reftable_merged_table)); + m->stack = stack; + m->stack_len = n; + m->min = first_min; + m->max = last_max; + m->hash_id = hash_id; + *dest = m; + return 0; +} + +/* clears the list of subtable, without affecting the readers themselves. */ +void merged_table_release(struct reftable_merged_table *mt) +{ + FREE_AND_NULL(mt->stack); + mt->stack_len = 0; +} + +void reftable_merged_table_free(struct reftable_merged_table *mt) +{ + if (!mt) { + return; + } + merged_table_release(mt); + reftable_free(mt); +} + +uint64_t +reftable_merged_table_max_update_index(struct reftable_merged_table *mt) +{ + return mt->max; +} + +uint64_t +reftable_merged_table_min_update_index(struct reftable_merged_table *mt) +{ + return mt->min; +} + +static int reftable_table_seek_record(struct reftable_table *tab, + struct reftable_iterator *it, + struct reftable_record *rec) +{ + return tab->ops->seek_record(tab->table_arg, it, rec); +} + +static int merged_table_seek_record(struct reftable_merged_table *mt, + struct reftable_iterator *it, + struct reftable_record *rec) +{ + struct reftable_iterator *iters = reftable_calloc( + sizeof(struct reftable_iterator) * mt->stack_len); + struct merged_iter merged = { + .stack = iters, + .typ = reftable_record_type(rec), + .hash_id = mt->hash_id, + .suppress_deletions = mt->suppress_deletions, + }; + int n = 0; + int err = 0; + int i = 0; + for (i = 0; i < mt->stack_len && err == 0; i++) { + int e = reftable_table_seek_record(&mt->stack[i], &iters[n], + rec); + if (e < 0) { + err = e; + } + if (e == 0) { + n++; + } + } + if (err < 0) { + int i = 0; + for (i = 0; i < n; i++) { + reftable_iterator_destroy(&iters[i]); + } + reftable_free(iters); + return err; + } + + merged.stack_len = n; + err = merged_iter_init(&merged); + if (err < 0) { + merged_iter_close(&merged); + return err; + } else { + struct merged_iter *p = + reftable_malloc(sizeof(struct merged_iter)); + *p = merged; + iterator_from_merged_iter(it, p); + } + return 0; +} + +int reftable_merged_table_seek_ref(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name) +{ + struct reftable_ref_record ref = { + .refname = (char *)name, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_ref(&rec, &ref); + return merged_table_seek_record(mt, it, &rec); +} + +int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name, uint64_t update_index) +{ + struct reftable_log_record log = { + .refname = (char *)name, + .update_index = update_index, + }; + struct reftable_record rec = { NULL }; + reftable_record_from_log(&rec, &log); + return merged_table_seek_record(mt, it, &rec); +} + +int reftable_merged_table_seek_log(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name) +{ + uint64_t max = ~((uint64_t)0); + return reftable_merged_table_seek_log_at(mt, it, name, max); +} + +uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt) +{ + return mt->hash_id; +} + +static int reftable_merged_table_seek_void(void *tab, + struct reftable_iterator *it, + struct reftable_record *rec) +{ + return merged_table_seek_record(tab, it, rec); +} + +static uint32_t reftable_merged_table_hash_id_void(void *tab) +{ + return reftable_merged_table_hash_id(tab); +} + +static uint64_t reftable_merged_table_min_update_index_void(void *tab) +{ + return reftable_merged_table_min_update_index(tab); +} + +static uint64_t reftable_merged_table_max_update_index_void(void *tab) +{ + return reftable_merged_table_max_update_index(tab); +} + +static struct reftable_table_vtable merged_table_vtable = { + .seek_record = reftable_merged_table_seek_void, + .hash_id = reftable_merged_table_hash_id_void, + .min_update_index = reftable_merged_table_min_update_index_void, + .max_update_index = reftable_merged_table_max_update_index_void, +}; + +void reftable_table_from_merged_table(struct reftable_table *tab, + struct reftable_merged_table *merged) +{ + assert(!tab->ops); + tab->ops = &merged_table_vtable; + tab->table_arg = merged; +} diff --git a/reftable/merged.h b/reftable/merged.h new file mode 100644 index 00000000000..8c4d4d58d77 --- /dev/null +++ b/reftable/merged.h @@ -0,0 +1,35 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef MERGED_H +#define MERGED_H + +#include "pq.h" + +struct reftable_merged_table { + struct reftable_table *stack; + size_t stack_len; + uint32_t hash_id; + int suppress_deletions; + + uint64_t min; + uint64_t max; +}; + +struct merged_iter { + struct reftable_iterator *stack; + uint32_t hash_id; + size_t stack_len; + uint8_t typ; + int suppress_deletions; + struct merged_iter_pqueue pq; +}; + +void merged_table_release(struct reftable_merged_table *mt); + +#endif diff --git a/reftable/merged_test.c b/reftable/merged_test.c new file mode 100644 index 00000000000..da607401141 --- /dev/null +++ b/reftable/merged_test.c @@ -0,0 +1,292 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "merged.h" + +#include "system.h" + +#include "basics.h" +#include "blocksource.h" +#include "constants.h" +#include "reader.h" +#include "record.h" +#include "test_framework.h" +#include "reftable-merged.h" +#include "reftable-tests.h" +#include "reftable-generic.h" +#include "reftable-writer.h" + +static void write_test_table(struct strbuf *buf, + struct reftable_ref_record refs[], int n) +{ + int min = 0xffffffff; + int max = 0; + int i = 0; + int err; + + struct reftable_write_options opts = { + .block_size = 256, + }; + struct reftable_writer *w = NULL; + for (i = 0; i < n; i++) { + uint64_t ui = refs[i].update_index; + if (ui > max) { + max = ui; + } + if (ui < min) { + min = ui; + } + } + + w = reftable_new_writer(&strbuf_add_void, buf, &opts); + reftable_writer_set_limits(w, min, max); + + for (i = 0; i < n; i++) { + uint64_t before = refs[i].update_index; + int n = reftable_writer_add_ref(w, &refs[i]); + EXPECT(n == 0); + EXPECT(before == refs[i].update_index); + } + + err = reftable_writer_close(w); + EXPECT_ERR(err); + + reftable_writer_free(w); +} + +static struct reftable_merged_table * +merged_table_from_records(struct reftable_ref_record **refs, + struct reftable_block_source **source, + struct reftable_reader ***readers, int *sizes, + struct strbuf *buf, int n) +{ + int i = 0; + struct reftable_merged_table *mt = NULL; + int err; + struct reftable_table *tabs = + reftable_calloc(n * sizeof(struct reftable_table)); + *readers = reftable_calloc(n * sizeof(struct reftable_reader *)); + *source = reftable_calloc(n * sizeof(**source)); + for (i = 0; i < n; i++) { + write_test_table(&buf[i], refs[i], sizes[i]); + block_source_from_strbuf(&(*source)[i], &buf[i]); + + err = reftable_new_reader(&(*readers)[i], &(*source)[i], + "name"); + EXPECT_ERR(err); + reftable_table_from_reader(&tabs[i], (*readers)[i]); + } + + err = reftable_new_merged_table(&mt, tabs, n, GIT_SHA1_FORMAT_ID); + EXPECT_ERR(err); + return mt; +} + +static void readers_destroy(struct reftable_reader **readers, size_t n) +{ + int i = 0; + for (; i < n; i++) + reftable_reader_free(readers[i]); + reftable_free(readers); +} + +static void test_merged_between(void) +{ + uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 }; + + struct reftable_ref_record r1[] = { { + .refname = "b", + .update_index = 1, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash1, + } }; + struct reftable_ref_record r2[] = { { + .refname = "a", + .update_index = 2, + .value_type = REFTABLE_REF_DELETION, + } }; + + struct reftable_ref_record *refs[] = { r1, r2 }; + int sizes[] = { 1, 1 }; + struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT }; + struct reftable_block_source *bs = NULL; + struct reftable_reader **readers = NULL; + struct reftable_merged_table *mt = + merged_table_from_records(refs, &bs, &readers, sizes, bufs, 2); + int i; + struct reftable_ref_record ref = { NULL }; + struct reftable_iterator it = { NULL }; + int err = reftable_merged_table_seek_ref(mt, &it, "a"); + EXPECT_ERR(err); + + err = reftable_iterator_next_ref(&it, &ref); + EXPECT_ERR(err); + EXPECT(ref.update_index == 2); + reftable_ref_record_release(&ref); + reftable_iterator_destroy(&it); + readers_destroy(readers, 2); + reftable_merged_table_free(mt); + for (i = 0; i < ARRAY_SIZE(bufs); i++) { + strbuf_release(&bufs[i]); + } + reftable_free(bs); +} + +static void test_merged(void) +{ + uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 }; + uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 }; + struct reftable_ref_record r1[] = { + { + .refname = "a", + .update_index = 1, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash1, + }, + { + .refname = "b", + .update_index = 1, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash1, + }, + { + .refname = "c", + .update_index = 1, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash1, + } + }; + struct reftable_ref_record r2[] = { { + .refname = "a", + .update_index = 2, + .value_type = REFTABLE_REF_DELETION, + } }; + struct reftable_ref_record r3[] = { + { + .refname = "c", + .update_index = 3, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash2, + }, + { + .refname = "d", + .update_index = 3, + .value_type = REFTABLE_REF_VAL1, + .value.val1 = hash1, + }, + }; + + struct reftable_ref_record want[] = { + r2[0], + r1[1], + r3[0], + r3[1], + }; + + struct reftable_ref_record *refs[] = { r1, r2, r3 }; + int sizes[3] = { 3, 1, 2 }; + struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT }; + struct reftable_block_source *bs = NULL; + struct reftable_reader **readers = NULL; + struct reftable_merged_table *mt = + merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3); + + struct reftable_iterator it = { NULL }; + int err = reftable_merged_table_seek_ref(mt, &it, "a"); + struct reftable_ref_record *out = NULL; + size_t len = 0; + size_t cap = 0; + int i = 0; + + EXPECT_ERR(err); + while (len < 100) { /* cap loops/recursion. */ + struct reftable_ref_record ref = { NULL }; + int err = reftable_iterator_next_ref(&it, &ref); + if (err > 0) { + break; + } + if (len == cap) { + cap = 2 * cap + 1; + out = reftable_realloc( + out, sizeof(struct reftable_ref_record) * cap); + } + out[len++] = ref; + } + reftable_iterator_destroy(&it); + + EXPECT(ARRAY_SIZE(want) == len); + for (i = 0; i < len; i++) { + EXPECT(reftable_ref_record_equal(&want[i], &out[i], + GIT_SHA1_RAWSZ)); + } + for (i = 0; i < len; i++) { + reftable_ref_record_release(&out[i]); + } + reftable_free(out); + + for (i = 0; i < 3; i++) { + strbuf_release(&bufs[i]); + } + readers_destroy(readers, 3); + reftable_merged_table_free(mt); + reftable_free(bs); +} + +static void test_default_write_opts(void) +{ + struct reftable_write_options opts = { 0 }; + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + + struct reftable_ref_record rec = { + .refname = "master", + .update_index = 1, + }; + int err; + struct reftable_block_source source = { NULL }; + struct reftable_table *tab = reftable_calloc(sizeof(*tab) * 1); + uint32_t hash_id; + struct reftable_reader *rd = NULL; + struct reftable_merged_table *merged = NULL; + + reftable_writer_set_limits(w, 1, 1); + + err = reftable_writer_add_ref(w, &rec); + EXPECT_ERR(err); + + err = reftable_writer_close(w); + EXPECT_ERR(err); + reftable_writer_free(w); + + block_source_from_strbuf(&source, &buf); + + err = reftable_new_reader(&rd, &source, "filename"); + EXPECT_ERR(err); + + hash_id = reftable_reader_hash_id(rd); + EXPECT(hash_id == GIT_SHA1_FORMAT_ID); + + reftable_table_from_reader(&tab[0], rd); + err = reftable_new_merged_table(&merged, tab, 1, GIT_SHA1_FORMAT_ID); + EXPECT_ERR(err); + + reftable_reader_free(rd); + reftable_merged_table_free(merged); + strbuf_release(&buf); +} + +/* XXX test refs_for(oid) */ + +int merged_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_merged_between); + RUN_TEST(test_merged); + RUN_TEST(test_default_write_opts); + return 0; +} diff --git a/reftable/reftable-merged.h b/reftable/reftable-merged.h new file mode 100644 index 00000000000..1a6d16915ab --- /dev/null +++ b/reftable/reftable-merged.h @@ -0,0 +1,72 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_MERGED_H +#define REFTABLE_MERGED_H + +#include "reftable-iterator.h" + +/* + * Merged tables + * + * A ref database kept in a sequence of table files. The merged_table presents a + * unified view to reading (seeking, iterating) a sequence of immutable tables. + * + * The merged tables are on purpose kept disconnected from their actual storage + * (eg. files on disk), because it is useful to merge tables aren't files. For + * example, the per-workspace and global ref namespace can be implemented as a + * merged table of two stacks of file-backed reftables. + */ + +/* A merged table is implements seeking/iterating over a stack of tables. */ +struct reftable_merged_table; + +/* A generic reftable; see below. */ +struct reftable_table; + +/* reftable_new_merged_table creates a new merged table. It takes ownership of + the stack array. +*/ +int reftable_new_merged_table(struct reftable_merged_table **dest, + struct reftable_table *stack, int n, + uint32_t hash_id); + +/* returns an iterator positioned just before 'name' */ +int reftable_merged_table_seek_ref(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name); + +/* returns an iterator for log entry, at given update_index */ +int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name, uint64_t update_index); + +/* like reftable_merged_table_seek_log_at but look for the newest entry. */ +int reftable_merged_table_seek_log(struct reftable_merged_table *mt, + struct reftable_iterator *it, + const char *name); + +/* returns the max update_index covered by this merged table. */ +uint64_t +reftable_merged_table_max_update_index(struct reftable_merged_table *mt); + +/* returns the min update_index covered by this merged table. */ +uint64_t +reftable_merged_table_min_update_index(struct reftable_merged_table *mt); + +/* releases memory for the merged_table */ +void reftable_merged_table_free(struct reftable_merged_table *m); + +/* return the hash ID of the merged table. */ +uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *m); + +/* create a generic table from reftable_merged_table */ +void reftable_table_from_merged_table(struct reftable_table *tab, + struct reftable_merged_table *table); + +#endif diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 0b5a1701df1..8087f2da4e6 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -5,6 +5,7 @@ int cmd__reftable(int argc, const char **argv) { basics_test_main(argc, argv); block_test_main(argc, argv); + merged_test_main(argc, argv); pq_test_main(argc, argv); record_test_main(argc, argv); readwrite_test_main(argc, argv); From patchwork Thu Sep 9 18:47:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A4A8C433EF for ; Thu, 9 Sep 2021 18:49:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 815AC610FF for ; Thu, 9 Sep 2021 18:49:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238960AbhIISuW (ORCPT ); Thu, 9 Sep 2021 14:50:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245559AbhIIStO (ORCPT ); Thu, 9 Sep 2021 14:49:14 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 912DFC061794 for ; Thu, 9 Sep 2021 11:47:57 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id i23so30077wrb.2 for ; Thu, 09 Sep 2021 11:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=niytwrRovUqVYof6uia0f4Mo0aS082Xgqfh1GdEq5Tw=; b=RG7fdQT0j5KAGhb0wG6RB8T79gtH/IIcg57zNbGwJdH4JtbFjmQUTTdwCqSptJOjoh TGFlBRloBz8AYGC9tyPMyyVKVJ1E2qX56Wid5h0vKdULXvp2S/NM7ZQ2CMqpTKvqfPs0 YDLmdubqCX2fWSUoJaCWM9cU/BxtqbhJAQ+atoqxWXMGw/ubham3mSaL9ws7V9pT7orc rQkkDhegcTIs9WFBt08TvtP8n0u/H34en+7wyBuxrqWWIjMXCc85WqrcqtKyjgmQ73Qo XTRJh5Jwbn3M17QBQEKS8IfXJ+6QT/llH7zKDliDYO5NrAoZsKfJSczsmKtHNWTExlgI 6FIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=niytwrRovUqVYof6uia0f4Mo0aS082Xgqfh1GdEq5Tw=; b=PXUK8oowoSQLgqrGUhrZBuwu51lTgcXGynSGH5yMXa+kj5XD1m0fQuh98kxt6raS0O ZXsPI01ysszzj4aVY5PB6d8S1TNWZmoBGdAi4d53Th6oCzLeQQm3i/H5NsA245+B2Emt eBsQuKkZd200oiS4C87VaYPedmkMLWVoTlTomtURlmYu9xbrC4cL56Li2o/nYYF/7eTO VIiF3IMApwsViWRCQRzHjzJBqPySSojrCADSsfqIYtyNrVleFNJr6nfAtMloylsiFWma kDdar/QP7CnC37YqXDR7DwnBBWK4rfRn+gZAkhA6EgRLZD3dvcsj43l6gHRYk+bAOWg+ wsUg== X-Gm-Message-State: AOAM53317DoviRZob/ih94thr1awWEneYB6ICzyYouqTqIfRPm3EDPzt Tqf1EJE4Fsr99SGs7mICAqWksKTnzhI= X-Google-Smtp-Source: ABdhPJyl53bnQya/hAoLL3MOn6gsUBYAiO4bRxo0sOK/2tnlaTpLZCY/8fpyzdorlb1nJAVrC6zX9A== X-Received: by 2002:a5d:5610:: with SMTP id l16mr5003115wrv.102.1631213276155; Thu, 09 Sep 2021 11:47:56 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j18sm2611466wrd.56.2021.09.09.11.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:55 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:41 +0000 Subject: [PATCH v2 16/19] reftable: implement refname validation Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys The packed/loose format has restrictions on refnames: a and a/b cannot coexist. This limitation does not apply to reftable per se, but must be maintained for interoperability. This code adds validation routines to abort transactions that are trying to add invalid names. Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + reftable/refname.c | 209 +++++++++++++++++++++++++++++++++++++++ reftable/refname.h | 29 ++++++ reftable/refname_test.c | 102 +++++++++++++++++++ t/helper/test-reftable.c | 1 + 5 files changed, 342 insertions(+) create mode 100644 reftable/refname.c create mode 100644 reftable/refname.h create mode 100644 reftable/refname_test.c diff --git a/Makefile b/Makefile index 27f29779c9a..76761ef16a6 100644 --- a/Makefile +++ b/Makefile @@ -2478,6 +2478,7 @@ REFTABLE_TEST_OBJS += reftable/merged_test.o REFTABLE_TEST_OBJS += reftable/pq_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/readwrite_test.o +REFTABLE_TEST_OBJS += reftable/refname_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o REFTABLE_TEST_OBJS += reftable/tree_test.o diff --git a/reftable/refname.c b/reftable/refname.c new file mode 100644 index 00000000000..95734969324 --- /dev/null +++ b/reftable/refname.c @@ -0,0 +1,209 @@ +/* + Copyright 2020 Google LLC + + Use of this source code is governed by a BSD-style + license that can be found in the LICENSE file or at + https://developers.google.com/open-source/licenses/bsd +*/ + +#include "system.h" +#include "reftable-error.h" +#include "basics.h" +#include "refname.h" +#include "reftable-iterator.h" + +struct find_arg { + char **names; + const char *want; +}; + +static int find_name(size_t k, void *arg) +{ + struct find_arg *f_arg = arg; + return strcmp(f_arg->names[k], f_arg->want) >= 0; +} + +static int modification_has_ref(struct modification *mod, const char *name) +{ + struct reftable_ref_record ref = { NULL }; + int err = 0; + + if (mod->add_len > 0) { + struct find_arg arg = { + .names = mod->add, + .want = name, + }; + int idx = binsearch(mod->add_len, find_name, &arg); + if (idx < mod->add_len && !strcmp(mod->add[idx], name)) { + return 0; + } + } + + if (mod->del_len > 0) { + struct find_arg arg = { + .names = mod->del, + .want = name, + }; + int idx = binsearch(mod->del_len, find_name, &arg); + if (idx < mod->del_len && !strcmp(mod->del[idx], name)) { + return 1; + } + } + + err = reftable_table_read_ref(&mod->tab, name, &ref); + reftable_ref_record_release(&ref); + return err; +} + +static void modification_release(struct modification *mod) +{ + /* don't delete the strings themselves; they're owned by ref records. + */ + FREE_AND_NULL(mod->add); + FREE_AND_NULL(mod->del); + mod->add_len = 0; + mod->del_len = 0; +} + +static int modification_has_ref_with_prefix(struct modification *mod, + const char *prefix) +{ + struct reftable_iterator it = { NULL }; + struct reftable_ref_record ref = { NULL }; + int err = 0; + + if (mod->add_len > 0) { + struct find_arg arg = { + .names = mod->add, + .want = prefix, + }; + int idx = binsearch(mod->add_len, find_name, &arg); + if (idx < mod->add_len && + !strncmp(prefix, mod->add[idx], strlen(prefix))) + goto done; + } + err = reftable_table_seek_ref(&mod->tab, &it, prefix); + if (err) + goto done; + + while (1) { + err = reftable_iterator_next_ref(&it, &ref); + if (err) + goto done; + + if (mod->del_len > 0) { + struct find_arg arg = { + .names = mod->del, + .want = ref.refname, + }; + int idx = binsearch(mod->del_len, find_name, &arg); + if (idx < mod->del_len && + !strcmp(ref.refname, mod->del[idx])) { + continue; + } + } + + if (strncmp(ref.refname, prefix, strlen(prefix))) { + err = 1; + goto done; + } + err = 0; + goto done; + } + +done: + reftable_ref_record_release(&ref); + reftable_iterator_destroy(&it); + return err; +} + +static int validate_refname(const char *name) +{ + while (1) { + char *next = strchr(name, '/'); + if (!*name) { + return REFTABLE_REFNAME_ERROR; + } + if (!next) { + return 0; + } + if (next - name == 0 || (next - name == 1 && *name == '.') || + (next - name == 2 && name[0] == '.' && name[1] == '.')) + return REFTABLE_REFNAME_ERROR; + name = next + 1; + } + return 0; +} + +int validate_ref_record_addition(struct reftable_table tab, + struct reftable_ref_record *recs, size_t sz) +{ + struct modification mod = { + .tab = tab, + .add = reftable_calloc(sizeof(char *) * sz), + .del = reftable_calloc(sizeof(char *) * sz), + }; + int i = 0; + int err = 0; + for (; i < sz; i++) { + if (reftable_ref_record_is_deletion(&recs[i])) { + mod.del[mod.del_len++] = recs[i].refname; + } else { + mod.add[mod.add_len++] = recs[i].refname; + } + } + + err = modification_validate(&mod); + modification_release(&mod); + return err; +} + +static void strbuf_trim_component(struct strbuf *sl) +{ + while (sl->len > 0) { + int is_slash = (sl->buf[sl->len - 1] == '/'); + strbuf_setlen(sl, sl->len - 1); + if (is_slash) + break; + } +} + +int modification_validate(struct modification *mod) +{ + struct strbuf slashed = STRBUF_INIT; + int err = 0; + int i = 0; + for (; i < mod->add_len; i++) { + err = validate_refname(mod->add[i]); + if (err) + goto done; + strbuf_reset(&slashed); + strbuf_addstr(&slashed, mod->add[i]); + strbuf_addstr(&slashed, "/"); + + err = modification_has_ref_with_prefix(mod, slashed.buf); + if (err == 0) { + err = REFTABLE_NAME_CONFLICT; + goto done; + } + if (err < 0) + goto done; + + strbuf_reset(&slashed); + strbuf_addstr(&slashed, mod->add[i]); + while (slashed.len) { + strbuf_trim_component(&slashed); + err = modification_has_ref(mod, slashed.buf); + if (err == 0) { + err = REFTABLE_NAME_CONFLICT; + goto done; + } + if (err < 0) + goto done; + } + } + err = 0; +done: + strbuf_release(&slashed); + return err; +} diff --git a/reftable/refname.h b/reftable/refname.h new file mode 100644 index 00000000000..a24b40fcb42 --- /dev/null +++ b/reftable/refname.h @@ -0,0 +1,29 @@ +/* + Copyright 2020 Google LLC + + Use of this source code is governed by a BSD-style + license that can be found in the LICENSE file or at + https://developers.google.com/open-source/licenses/bsd +*/ +#ifndef REFNAME_H +#define REFNAME_H + +#include "reftable-record.h" +#include "reftable-generic.h" + +struct modification { + struct reftable_table tab; + + char **add; + size_t add_len; + + char **del; + size_t del_len; +}; + +int validate_ref_record_addition(struct reftable_table tab, + struct reftable_ref_record *recs, size_t sz); + +int modification_validate(struct modification *mod); + +#endif diff --git a/reftable/refname_test.c b/reftable/refname_test.c new file mode 100644 index 00000000000..8645cd93bbd --- /dev/null +++ b/reftable/refname_test.c @@ -0,0 +1,102 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "basics.h" +#include "block.h" +#include "blocksource.h" +#include "constants.h" +#include "reader.h" +#include "record.h" +#include "refname.h" +#include "reftable-error.h" +#include "reftable-writer.h" +#include "system.h" + +#include "test_framework.h" +#include "reftable-tests.h" + +struct testcase { + char *add; + char *del; + int error_code; +}; + +static void test_conflict(void) +{ + struct reftable_write_options opts = { 0 }; + struct strbuf buf = STRBUF_INIT; + struct reftable_writer *w = + reftable_new_writer(&strbuf_add_void, &buf, &opts); + struct reftable_ref_record rec = { + .refname = "a/b", + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "destination", /* make sure it's not a symref. + */ + .update_index = 1, + }; + int err; + int i; + struct reftable_block_source source = { NULL }; + struct reftable_reader *rd = NULL; + struct reftable_table tab = { NULL }; + struct testcase cases[] = { + { "a/b/c", NULL, REFTABLE_NAME_CONFLICT }, + { "b", NULL, 0 }, + { "a", NULL, REFTABLE_NAME_CONFLICT }, + { "a", "a/b", 0 }, + + { "p/", NULL, REFTABLE_REFNAME_ERROR }, + { "p//q", NULL, REFTABLE_REFNAME_ERROR }, + { "p/./q", NULL, REFTABLE_REFNAME_ERROR }, + { "p/../q", NULL, REFTABLE_REFNAME_ERROR }, + + { "a/b/c", "a/b", 0 }, + { NULL, "a//b", 0 }, + }; + reftable_writer_set_limits(w, 1, 1); + + err = reftable_writer_add_ref(w, &rec); + EXPECT_ERR(err); + + err = reftable_writer_close(w); + EXPECT_ERR(err); + reftable_writer_free(w); + + block_source_from_strbuf(&source, &buf); + err = reftable_new_reader(&rd, &source, "filename"); + EXPECT_ERR(err); + + reftable_table_from_reader(&tab, rd); + + for (i = 0; i < ARRAY_SIZE(cases); i++) { + struct modification mod = { + .tab = tab, + }; + + if (cases[i].add) { + mod.add = &cases[i].add; + mod.add_len = 1; + } + if (cases[i].del) { + mod.del = &cases[i].del; + mod.del_len = 1; + } + + err = modification_validate(&mod); + EXPECT(err == cases[i].error_code); + } + + reftable_reader_free(rd); + strbuf_release(&buf); +} + +int refname_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_conflict); + return 0; +} diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 8087f2da4e6..c8db6852c35 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -8,6 +8,7 @@ int cmd__reftable(int argc, const char **argv) merged_test_main(argc, argv); pq_test_main(argc, argv); record_test_main(argc, argv); + refname_test_main(argc, argv); readwrite_test_main(argc, argv); tree_test_main(argc, argv); return 0; From patchwork Thu Sep 9 18:47:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F633C433FE for ; Thu, 9 Sep 2021 18:49:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDF72610FF for ; Thu, 9 Sep 2021 18:49:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343978AbhIISue (ORCPT ); Thu, 9 Sep 2021 14:50:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245618AbhIIStO (ORCPT ); Thu, 9 Sep 2021 14:49:14 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE2B6C0617A6 for ; Thu, 9 Sep 2021 11:47:58 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id t18so3981390wrb.0 for ; Thu, 09 Sep 2021 11:47:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=2UGrxnUJGsOjc7kixIwbIh5rVyC5Uc/qRvgLGYMIX+E=; b=P9ThLWKfXTDojWHGo1DSj0YsGdJrl+1OUy6UlpSfVU30Xw63pv03oAHim9jLAggBgN D3D9052J8pyBPG0sDPO47+bs8ZXFv5MCtSiZV0siGg/TTWu301Pt2p+oXC+H5sbo1oqY mjMFMO1iXeUpr2GVAr+DNDkmIJeTqz4OuGFbP2O7SIeqZUf9b0k9cA7uKwKoeuqsGLjt wp3MTk1fts9+3g63QCmrZj2rl3yBS4ceob02p05iiLRYkuTSHF1aEzGBo9KbdMMbRo5F bb29ukeR2UCfEhCMyVZ/3V+dmE7znjB69XncMSWF261YJVSs6h03ZDMw8av6KkmgHKun 7sbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=2UGrxnUJGsOjc7kixIwbIh5rVyC5Uc/qRvgLGYMIX+E=; b=cJI7DZHqcj3pde7NY3AIh1U/iluVWeaDAgWBJS+ocroaiG6LdHcK5LR+2wla4wJiCl apwhWy1BojfUnuvhsyjYQGc8XoyWlJUA86y7pMY7Fa6yc/AU3lMB85YTJo7H362nZUIW UMKLI3m6Xjunlb3nYrVTTiwzaIc0nVTaK8U3BAOYysPFQM+A04j5/atbrTdatwPsZp8b Mtq2YxZg5NDF7thnb4IE0aH/0fUjjSfau62kDW1h+V6zeaeWFSgHZWRQ6T1p9c0Ckqj2 cE5vz/UKcXZ5DGdDfiXoQKoJPjaGIGeBXwqHMYLxh81m5gqqmXhoELnYLI4Z71mVhuUe 2GCA== X-Gm-Message-State: AOAM5333K7XciAzIt6s2a/hjMh135c3jAjQI4h5Ouz7cPOsyjk+WgxPX lbfH0zIciVkWSk9X52CjYdjRDOjoIqQ= X-Google-Smtp-Source: ABdhPJxC5ZjwMQVYtVVsirt2Z+zKb1UdGG9nriMbGgS5id+8tkXrE6ehWIZ/3jzb2GS7zmY40rNlRw== X-Received: by 2002:a05:6000:160f:: with SMTP id u15mr5508817wrb.166.1631213276975; Thu, 09 Sep 2021 11:47:56 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x21sm2578446wmc.14.2021.09.09.11.47.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:56 -0700 (PDT) Message-Id: <8b0ee68c636a10eca73092e7a4c823811977ccc8.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:42 +0000 Subject: [PATCH v2 17/19] reftable: implement stack, a mutable database of reftable files. Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + reftable/reftable-stack.h | 128 ++++ reftable/stack.c | 1396 +++++++++++++++++++++++++++++++++++++ reftable/stack.h | 41 ++ reftable/stack_test.c | 949 +++++++++++++++++++++++++ t/helper/test-reftable.c | 1 + 6 files changed, 2516 insertions(+) create mode 100644 reftable/reftable-stack.h create mode 100644 reftable/stack.c create mode 100644 reftable/stack.h create mode 100644 reftable/stack_test.c diff --git a/Makefile b/Makefile index 76761ef16a6..619b1b15ce0 100644 --- a/Makefile +++ b/Makefile @@ -2479,6 +2479,7 @@ REFTABLE_TEST_OBJS += reftable/pq_test.o REFTABLE_TEST_OBJS += reftable/record_test.o REFTABLE_TEST_OBJS += reftable/readwrite_test.o REFTABLE_TEST_OBJS += reftable/refname_test.o +REFTABLE_TEST_OBJS += reftable/stack_test.o REFTABLE_TEST_OBJS += reftable/test_framework.o REFTABLE_TEST_OBJS += reftable/tree_test.o diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h new file mode 100644 index 00000000000..1b602dda58a --- /dev/null +++ b/reftable/reftable-stack.h @@ -0,0 +1,128 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef REFTABLE_STACK_H +#define REFTABLE_STACK_H + +#include "reftable-writer.h" + +/* + * The stack presents an interface to a mutable sequence of reftables. + + * A stack can be mutated by pushing a table to the top of the stack. + + * The reftable_stack automatically compacts files on disk to ensure good + * amortized performance. + * + * For windows and other platforms that cannot have open files as rename + * destinations, concurrent access from multiple processes needs the rand() + * random seed to be randomized. + */ +struct reftable_stack; + +/* open a new reftable stack. The tables along with the table list will be + * stored in 'dir'. Typically, this should be .git/reftables. + */ +int reftable_new_stack(struct reftable_stack **dest, const char *dir, + struct reftable_write_options config); + +/* returns the update_index at which a next table should be written. */ +uint64_t reftable_stack_next_update_index(struct reftable_stack *st); + +/* holds a transaction to add tables at the top of a stack. */ +struct reftable_addition; + +/* + * returns a new transaction to add reftables to the given stack. As a side + * effect, the ref database is locked. + */ +int reftable_stack_new_addition(struct reftable_addition **dest, + struct reftable_stack *st); + +/* Adds a reftable to transaction. */ +int reftable_addition_add(struct reftable_addition *add, + int (*write_table)(struct reftable_writer *wr, + void *arg), + void *arg); + +/* Commits the transaction, releasing the lock. After calling this, + * reftable_addition_destroy should still be called. + */ +int reftable_addition_commit(struct reftable_addition *add); + +/* Release all non-committed data from the transaction, and deallocate the + * transaction. Releases the lock if held. */ +void reftable_addition_destroy(struct reftable_addition *add); + +/* add a new table to the stack. The write_table function must call + * reftable_writer_set_limits, add refs and return an error value. */ +int reftable_stack_add(struct reftable_stack *st, + int (*write_table)(struct reftable_writer *wr, + void *write_arg), + void *write_arg); + +/* returns the merged_table for seeking. This table is valid until the + * next write or reload, and should not be closed or deleted. + */ +struct reftable_merged_table * +reftable_stack_merged_table(struct reftable_stack *st); + +/* frees all resources associated with the stack. */ +void reftable_stack_destroy(struct reftable_stack *st); + +/* Reloads the stack if necessary. This is very cheap to run if the stack was up + * to date */ +int reftable_stack_reload(struct reftable_stack *st); + +/* Policy for expiring reflog entries. */ +struct reftable_log_expiry_config { + /* Drop entries older than this timestamp */ + uint64_t time; + + /* Drop older entries */ + uint64_t min_update_index; +}; + +/* compacts all reftables into a giant table. Expire reflog entries if config is + * non-NULL */ +int reftable_stack_compact_all(struct reftable_stack *st, + struct reftable_log_expiry_config *config); + +/* heuristically compact unbalanced table stack. */ +int reftable_stack_auto_compact(struct reftable_stack *st); + +/* delete stale .ref tables. */ +int reftable_stack_clean(struct reftable_stack *st); + +/* convenience function to read a single ref. Returns < 0 for error, 0 for + * success, and 1 if ref not found. */ +int reftable_stack_read_ref(struct reftable_stack *st, const char *refname, + struct reftable_ref_record *ref); + +/* convenience function to read a single log. Returns < 0 for error, 0 for + * success, and 1 if ref not found. */ +int reftable_stack_read_log(struct reftable_stack *st, const char *refname, + struct reftable_log_record *log); + +/* statistics on past compactions. */ +struct reftable_compaction_stats { + uint64_t bytes; /* total number of bytes written */ + uint64_t entries_written; /* total number of entries written, including + failures. */ + int attempts; /* how often we tried to compact */ + int failures; /* failures happen on concurrent updates */ +}; + +/* return statistics for compaction up till now. */ +struct reftable_compaction_stats * +reftable_stack_compaction_stats(struct reftable_stack *st); + +/* print the entire stack represented by the directory */ +int reftable_stack_print_directory(const char *stackdir, uint32_t hash_id); + +#endif diff --git a/reftable/stack.c b/reftable/stack.c new file mode 100644 index 00000000000..48e22a6c184 --- /dev/null +++ b/reftable/stack.c @@ -0,0 +1,1396 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "stack.h" + +#include "system.h" +#include "merged.h" +#include "reader.h" +#include "refname.h" +#include "reftable-error.h" +#include "reftable-record.h" +#include "reftable-merged.h" +#include "writer.h" + +static int stack_try_add(struct reftable_stack *st, + int (*write_table)(struct reftable_writer *wr, + void *arg), + void *arg); +static int stack_write_compact(struct reftable_stack *st, + struct reftable_writer *wr, int first, int last, + struct reftable_log_expiry_config *config); +static int stack_check_addition(struct reftable_stack *st, + const char *new_tab_name); +static void reftable_addition_close(struct reftable_addition *add); +static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, + int reuse_open); + +static void stack_filename(struct strbuf *dest, struct reftable_stack *st, + const char *name) +{ + strbuf_reset(dest); + strbuf_addstr(dest, st->reftable_dir); + strbuf_addstr(dest, "/"); + strbuf_addstr(dest, name); +} + +static ssize_t reftable_fd_write(void *arg, const void *data, size_t sz) +{ + int *fdp = (int *)arg; + return write(*fdp, data, sz); +} + +int reftable_new_stack(struct reftable_stack **dest, const char *dir, + struct reftable_write_options config) +{ + struct reftable_stack *p = + reftable_calloc(sizeof(struct reftable_stack)); + struct strbuf list_file_name = STRBUF_INIT; + int err = 0; + + if (config.hash_id == 0) { + config.hash_id = GIT_SHA1_FORMAT_ID; + } + + *dest = NULL; + + strbuf_reset(&list_file_name); + strbuf_addstr(&list_file_name, dir); + strbuf_addstr(&list_file_name, "/tables.list"); + + p->list_file = strbuf_detach(&list_file_name, NULL); + p->reftable_dir = xstrdup(dir); + p->config = config; + + err = reftable_stack_reload_maybe_reuse(p, 1); + if (err < 0) { + reftable_stack_destroy(p); + } else { + *dest = p; + } + return err; +} + +static int fd_read_lines(int fd, char ***namesp) +{ + off_t size = lseek(fd, 0, SEEK_END); + char *buf = NULL; + int err = 0; + if (size < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + err = lseek(fd, 0, SEEK_SET); + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + buf = reftable_malloc(size + 1); + if (read(fd, buf, size) != size) { + err = REFTABLE_IO_ERROR; + goto done; + } + buf[size] = 0; + + parse_names(buf, size, namesp); + +done: + reftable_free(buf); + return err; +} + +int read_lines(const char *filename, char ***namesp) +{ + int fd = open(filename, O_RDONLY, 0644); + int err = 0; + if (fd < 0) { + if (errno == ENOENT) { + *namesp = reftable_calloc(sizeof(char *)); + return 0; + } + + return REFTABLE_IO_ERROR; + } + err = fd_read_lines(fd, namesp); + close(fd); + return err; +} + +struct reftable_merged_table * +reftable_stack_merged_table(struct reftable_stack *st) +{ + return st->merged; +} + +static int has_name(char **names, const char *name) +{ + while (*names) { + if (!strcmp(*names, name)) + return 1; + names++; + } + return 0; +} + +/* Close and free the stack */ +void reftable_stack_destroy(struct reftable_stack *st) +{ + char **names = NULL; + int err = 0; + if (st->merged) { + reftable_merged_table_free(st->merged); + st->merged = NULL; + } + + err = read_lines(st->list_file, &names); + if (err < 0) { + FREE_AND_NULL(names); + } + + if (st->readers) { + int i = 0; + struct strbuf filename = STRBUF_INIT; + for (i = 0; i < st->readers_len; i++) { + const char *name = reader_name(st->readers[i]); + strbuf_reset(&filename); + if (names && !has_name(names, name)) { + stack_filename(&filename, st, name); + } + reftable_reader_free(st->readers[i]); + + if (filename.len) { + /* On Windows, can only unlink after closing. */ + unlink(filename.buf); + } + } + strbuf_release(&filename); + st->readers_len = 0; + FREE_AND_NULL(st->readers); + } + FREE_AND_NULL(st->list_file); + FREE_AND_NULL(st->reftable_dir); + reftable_free(st); + free_names(names); +} + +static struct reftable_reader **stack_copy_readers(struct reftable_stack *st, + int cur_len) +{ + struct reftable_reader **cur = + reftable_calloc(sizeof(struct reftable_reader *) * cur_len); + int i = 0; + for (i = 0; i < cur_len; i++) { + cur[i] = st->readers[i]; + } + return cur; +} + +static int reftable_stack_reload_once(struct reftable_stack *st, char **names, + int reuse_open) +{ + int cur_len = !st->merged ? 0 : st->merged->stack_len; + struct reftable_reader **cur = stack_copy_readers(st, cur_len); + int err = 0; + int names_len = names_length(names); + struct reftable_reader **new_readers = + reftable_calloc(sizeof(struct reftable_reader *) * names_len); + struct reftable_table *new_tables = + reftable_calloc(sizeof(struct reftable_table) * names_len); + int new_readers_len = 0; + struct reftable_merged_table *new_merged = NULL; + int i; + + while (*names) { + struct reftable_reader *rd = NULL; + char *name = *names++; + + /* this is linear; we assume compaction keeps the number of + tables under control so this is not quadratic. */ + int j = 0; + for (j = 0; reuse_open && j < cur_len; j++) { + if (cur[j] && 0 == strcmp(cur[j]->name, name)) { + rd = cur[j]; + cur[j] = NULL; + break; + } + } + + if (!rd) { + struct reftable_block_source src = { NULL }; + struct strbuf table_path = STRBUF_INIT; + stack_filename(&table_path, st, name); + + err = reftable_block_source_from_file(&src, + table_path.buf); + strbuf_release(&table_path); + + if (err < 0) + goto done; + + err = reftable_new_reader(&rd, &src, name); + if (err < 0) + goto done; + } + + new_readers[new_readers_len] = rd; + reftable_table_from_reader(&new_tables[new_readers_len], rd); + new_readers_len++; + } + + /* success! */ + err = reftable_new_merged_table(&new_merged, new_tables, + new_readers_len, st->config.hash_id); + if (err < 0) + goto done; + + new_tables = NULL; + st->readers_len = new_readers_len; + if (st->merged) { + merged_table_release(st->merged); + reftable_merged_table_free(st->merged); + } + if (st->readers) { + reftable_free(st->readers); + } + st->readers = new_readers; + new_readers = NULL; + new_readers_len = 0; + + new_merged->suppress_deletions = 1; + st->merged = new_merged; + for (i = 0; i < cur_len; i++) { + if (cur[i]) { + const char *name = reader_name(cur[i]); + struct strbuf filename = STRBUF_INIT; + stack_filename(&filename, st, name); + + reader_close(cur[i]); + reftable_reader_free(cur[i]); + + /* On Windows, can only unlink after closing. */ + unlink(filename.buf); + + strbuf_release(&filename); + } + } + +done: + for (i = 0; i < new_readers_len; i++) { + reader_close(new_readers[i]); + reftable_reader_free(new_readers[i]); + } + reftable_free(new_readers); + reftable_free(new_tables); + reftable_free(cur); + return err; +} + +/* return negative if a before b. */ +static int tv_cmp(struct timeval *a, struct timeval *b) +{ + time_t diff = a->tv_sec - b->tv_sec; + int udiff = a->tv_usec - b->tv_usec; + + if (diff != 0) + return diff; + + return udiff; +} + +static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, + int reuse_open) +{ + struct timeval deadline = { 0 }; + int err = gettimeofday(&deadline, NULL); + int64_t delay = 0; + int tries = 0; + if (err < 0) + return err; + + deadline.tv_sec += 3; + while (1) { + char **names = NULL; + char **names_after = NULL; + struct timeval now = { 0 }; + int err = gettimeofday(&now, NULL); + int err2 = 0; + if (err < 0) { + return err; + } + + /* Only look at deadlines after the first few times. This + simplifies debugging in GDB */ + tries++; + if (tries > 3 && tv_cmp(&now, &deadline) >= 0) { + break; + } + + err = read_lines(st->list_file, &names); + if (err < 0) { + free_names(names); + return err; + } + err = reftable_stack_reload_once(st, names, reuse_open); + if (err == 0) { + free_names(names); + break; + } + if (err != REFTABLE_NOT_EXIST_ERROR) { + free_names(names); + return err; + } + + /* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent + writer. Check if there was one by checking if the name list + changed. + */ + err2 = read_lines(st->list_file, &names_after); + if (err2 < 0) { + free_names(names); + return err2; + } + + if (names_equal(names_after, names)) { + free_names(names); + free_names(names_after); + return err; + } + free_names(names); + free_names(names_after); + + delay = delay + (delay * rand()) / RAND_MAX + 1; + sleep_millisec(delay); + } + + return 0; +} + +/* -1 = error + 0 = up to date + 1 = changed. */ +static int stack_uptodate(struct reftable_stack *st) +{ + char **names = NULL; + int err = read_lines(st->list_file, &names); + int i = 0; + if (err < 0) + return err; + + for (i = 0; i < st->readers_len; i++) { + if (!names[i]) { + err = 1; + goto done; + } + + if (strcmp(st->readers[i]->name, names[i])) { + err = 1; + goto done; + } + } + + if (names[st->merged->stack_len]) { + err = 1; + goto done; + } + +done: + free_names(names); + return err; +} + +int reftable_stack_reload(struct reftable_stack *st) +{ + int err = stack_uptodate(st); + if (err > 0) + return reftable_stack_reload_maybe_reuse(st, 1); + return err; +} + +int reftable_stack_add(struct reftable_stack *st, + int (*write)(struct reftable_writer *wr, void *arg), + void *arg) +{ + int err = stack_try_add(st, write, arg); + if (err < 0) { + if (err == REFTABLE_LOCK_ERROR) { + /* Ignore error return, we want to propagate + REFTABLE_LOCK_ERROR. + */ + reftable_stack_reload(st); + } + return err; + } + + if (!st->disable_auto_compact) + return reftable_stack_auto_compact(st); + + return 0; +} + +static void format_name(struct strbuf *dest, uint64_t min, uint64_t max) +{ + char buf[100]; + uint32_t rnd = (uint32_t)rand(); + snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64 "-%08x", + min, max, rnd); + strbuf_reset(dest); + strbuf_addstr(dest, buf); +} + +struct reftable_addition { + int lock_file_fd; + struct strbuf lock_file_name; + struct reftable_stack *stack; + + char **new_tables; + int new_tables_len; + uint64_t next_update_index; +}; + +#define REFTABLE_ADDITION_INIT \ + { \ + .lock_file_name = STRBUF_INIT \ + } + +static int reftable_stack_init_addition(struct reftable_addition *add, + struct reftable_stack *st) +{ + int err = 0; + add->stack = st; + + strbuf_reset(&add->lock_file_name); + strbuf_addstr(&add->lock_file_name, st->list_file); + strbuf_addstr(&add->lock_file_name, ".lock"); + + add->lock_file_fd = open(add->lock_file_name.buf, + O_EXCL | O_CREAT | O_WRONLY, 0644); + if (add->lock_file_fd < 0) { + if (errno == EEXIST) { + err = REFTABLE_LOCK_ERROR; + } else { + err = REFTABLE_IO_ERROR; + } + goto done; + } + err = stack_uptodate(st); + if (err < 0) + goto done; + + if (err > 1) { + err = REFTABLE_LOCK_ERROR; + goto done; + } + + add->next_update_index = reftable_stack_next_update_index(st); +done: + if (err) { + reftable_addition_close(add); + } + return err; +} + +static void reftable_addition_close(struct reftable_addition *add) +{ + int i = 0; + struct strbuf nm = STRBUF_INIT; + for (i = 0; i < add->new_tables_len; i++) { + stack_filename(&nm, add->stack, add->new_tables[i]); + unlink(nm.buf); + reftable_free(add->new_tables[i]); + add->new_tables[i] = NULL; + } + reftable_free(add->new_tables); + add->new_tables = NULL; + add->new_tables_len = 0; + + if (add->lock_file_fd > 0) { + close(add->lock_file_fd); + add->lock_file_fd = 0; + } + if (add->lock_file_name.len > 0) { + unlink(add->lock_file_name.buf); + strbuf_release(&add->lock_file_name); + } + + strbuf_release(&nm); +} + +void reftable_addition_destroy(struct reftable_addition *add) +{ + if (!add) { + return; + } + reftable_addition_close(add); + reftable_free(add); +} + +int reftable_addition_commit(struct reftable_addition *add) +{ + struct strbuf table_list = STRBUF_INIT; + int i = 0; + int err = 0; + if (add->new_tables_len == 0) + goto done; + + for (i = 0; i < add->stack->merged->stack_len; i++) { + strbuf_addstr(&table_list, add->stack->readers[i]->name); + strbuf_addstr(&table_list, "\n"); + } + for (i = 0; i < add->new_tables_len; i++) { + strbuf_addstr(&table_list, add->new_tables[i]); + strbuf_addstr(&table_list, "\n"); + } + + err = write(add->lock_file_fd, table_list.buf, table_list.len); + strbuf_release(&table_list); + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + err = close(add->lock_file_fd); + add->lock_file_fd = 0; + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + err = rename(add->lock_file_name.buf, add->stack->list_file); + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + /* success, no more state to clean up. */ + strbuf_release(&add->lock_file_name); + for (i = 0; i < add->new_tables_len; i++) { + reftable_free(add->new_tables[i]); + } + reftable_free(add->new_tables); + add->new_tables = NULL; + add->new_tables_len = 0; + + err = reftable_stack_reload(add->stack); +done: + reftable_addition_close(add); + return err; +} + +int reftable_stack_new_addition(struct reftable_addition **dest, + struct reftable_stack *st) +{ + int err = 0; + struct reftable_addition empty = REFTABLE_ADDITION_INIT; + *dest = reftable_calloc(sizeof(**dest)); + **dest = empty; + err = reftable_stack_init_addition(*dest, st); + if (err) { + reftable_free(*dest); + *dest = NULL; + } + return err; +} + +static int stack_try_add(struct reftable_stack *st, + int (*write_table)(struct reftable_writer *wr, + void *arg), + void *arg) +{ + struct reftable_addition add = REFTABLE_ADDITION_INIT; + int err = reftable_stack_init_addition(&add, st); + if (err < 0) + goto done; + if (err > 0) { + err = REFTABLE_LOCK_ERROR; + goto done; + } + + err = reftable_addition_add(&add, write_table, arg); + if (err < 0) + goto done; + + err = reftable_addition_commit(&add); +done: + reftable_addition_close(&add); + return err; +} + +int reftable_addition_add(struct reftable_addition *add, + int (*write_table)(struct reftable_writer *wr, + void *arg), + void *arg) +{ + struct strbuf temp_tab_file_name = STRBUF_INIT; + struct strbuf tab_file_name = STRBUF_INIT; + struct strbuf next_name = STRBUF_INIT; + struct reftable_writer *wr = NULL; + int err = 0; + int tab_fd = 0; + + strbuf_reset(&next_name); + format_name(&next_name, add->next_update_index, add->next_update_index); + + stack_filename(&temp_tab_file_name, add->stack, next_name.buf); + strbuf_addstr(&temp_tab_file_name, ".temp.XXXXXX"); + + tab_fd = mkstemp(temp_tab_file_name.buf); + if (tab_fd < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + wr = reftable_new_writer(reftable_fd_write, &tab_fd, + &add->stack->config); + err = write_table(wr, arg); + if (err < 0) + goto done; + + err = reftable_writer_close(wr); + if (err == REFTABLE_EMPTY_TABLE_ERROR) { + err = 0; + goto done; + } + if (err < 0) + goto done; + + err = close(tab_fd); + tab_fd = 0; + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + err = stack_check_addition(add->stack, temp_tab_file_name.buf); + if (err < 0) + goto done; + + if (wr->min_update_index < add->next_update_index) { + err = REFTABLE_API_ERROR; + goto done; + } + + format_name(&next_name, wr->min_update_index, wr->max_update_index); + strbuf_addstr(&next_name, ".ref"); + + stack_filename(&tab_file_name, add->stack, next_name.buf); + + /* + On windows, this relies on rand() picking a unique destination name. + Maybe we should do retry loop as well? + */ + err = rename(temp_tab_file_name.buf, tab_file_name.buf); + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + + add->new_tables = reftable_realloc(add->new_tables, + sizeof(*add->new_tables) * + (add->new_tables_len + 1)); + add->new_tables[add->new_tables_len] = strbuf_detach(&next_name, NULL); + add->new_tables_len++; +done: + if (tab_fd > 0) { + close(tab_fd); + tab_fd = 0; + } + if (temp_tab_file_name.len > 0) { + unlink(temp_tab_file_name.buf); + } + + strbuf_release(&temp_tab_file_name); + strbuf_release(&tab_file_name); + strbuf_release(&next_name); + reftable_writer_free(wr); + return err; +} + +uint64_t reftable_stack_next_update_index(struct reftable_stack *st) +{ + int sz = st->merged->stack_len; + if (sz > 0) + return reftable_reader_max_update_index(st->readers[sz - 1]) + + 1; + return 1; +} + +static int stack_compact_locked(struct reftable_stack *st, int first, int last, + struct strbuf *temp_tab, + struct reftable_log_expiry_config *config) +{ + struct strbuf next_name = STRBUF_INIT; + int tab_fd = -1; + struct reftable_writer *wr = NULL; + int err = 0; + + format_name(&next_name, + reftable_reader_min_update_index(st->readers[first]), + reftable_reader_max_update_index(st->readers[last])); + + stack_filename(temp_tab, st, next_name.buf); + strbuf_addstr(temp_tab, ".temp.XXXXXX"); + + tab_fd = mkstemp(temp_tab->buf); + wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config); + + err = stack_write_compact(st, wr, first, last, config); + if (err < 0) + goto done; + err = reftable_writer_close(wr); + if (err < 0) + goto done; + + err = close(tab_fd); + tab_fd = 0; + +done: + reftable_writer_free(wr); + if (tab_fd > 0) { + close(tab_fd); + tab_fd = 0; + } + if (err != 0 && temp_tab->len > 0) { + unlink(temp_tab->buf); + strbuf_release(temp_tab); + } + strbuf_release(&next_name); + return err; +} + +static int stack_write_compact(struct reftable_stack *st, + struct reftable_writer *wr, int first, int last, + struct reftable_log_expiry_config *config) +{ + int subtabs_len = last - first + 1; + struct reftable_table *subtabs = reftable_calloc( + sizeof(struct reftable_table) * (last - first + 1)); + struct reftable_merged_table *mt = NULL; + int err = 0; + struct reftable_iterator it = { NULL }; + struct reftable_ref_record ref = { NULL }; + struct reftable_log_record log = { NULL }; + + uint64_t entries = 0; + + int i = 0, j = 0; + for (i = first, j = 0; i <= last; i++) { + struct reftable_reader *t = st->readers[i]; + reftable_table_from_reader(&subtabs[j++], t); + st->stats.bytes += t->size; + } + reftable_writer_set_limits(wr, st->readers[first]->min_update_index, + st->readers[last]->max_update_index); + + err = reftable_new_merged_table(&mt, subtabs, subtabs_len, + st->config.hash_id); + if (err < 0) { + reftable_free(subtabs); + goto done; + } + + err = reftable_merged_table_seek_ref(mt, &it, ""); + if (err < 0) + goto done; + + while (1) { + err = reftable_iterator_next_ref(&it, &ref); + if (err > 0) { + err = 0; + break; + } + if (err < 0) { + break; + } + + if (first == 0 && reftable_ref_record_is_deletion(&ref)) { + continue; + } + + err = reftable_writer_add_ref(wr, &ref); + if (err < 0) { + break; + } + entries++; + } + reftable_iterator_destroy(&it); + + err = reftable_merged_table_seek_log(mt, &it, ""); + if (err < 0) + goto done; + + while (1) { + err = reftable_iterator_next_log(&it, &log); + if (err > 0) { + err = 0; + break; + } + if (err < 0) { + break; + } + if (first == 0 && reftable_log_record_is_deletion(&log)) { + continue; + } + + if (config && config->min_update_index > 0 && + log.update_index < config->min_update_index) { + continue; + } + + if (config && config->time > 0 && + log.value.update.time < config->time) { + continue; + } + + err = reftable_writer_add_log(wr, &log); + if (err < 0) { + break; + } + entries++; + } + +done: + reftable_iterator_destroy(&it); + if (mt) { + merged_table_release(mt); + reftable_merged_table_free(mt); + } + reftable_ref_record_release(&ref); + reftable_log_record_release(&log); + st->stats.entries_written += entries; + return err; +} + +/* < 0: error. 0 == OK, > 0 attempt failed; could retry. */ +static int stack_compact_range(struct reftable_stack *st, int first, int last, + struct reftable_log_expiry_config *expiry) +{ + struct strbuf temp_tab_file_name = STRBUF_INIT; + struct strbuf new_table_name = STRBUF_INIT; + struct strbuf lock_file_name = STRBUF_INIT; + struct strbuf ref_list_contents = STRBUF_INIT; + struct strbuf new_table_path = STRBUF_INIT; + int err = 0; + int have_lock = 0; + int lock_file_fd = 0; + int compact_count = last - first + 1; + char **listp = NULL; + char **delete_on_success = + reftable_calloc(sizeof(char *) * (compact_count + 1)); + char **subtable_locks = + reftable_calloc(sizeof(char *) * (compact_count + 1)); + int i = 0; + int j = 0; + int is_empty_table = 0; + + if (first > last || (!expiry && first == last)) { + err = 0; + goto done; + } + + st->stats.attempts++; + + strbuf_reset(&lock_file_name); + strbuf_addstr(&lock_file_name, st->list_file); + strbuf_addstr(&lock_file_name, ".lock"); + + lock_file_fd = + open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644); + if (lock_file_fd < 0) { + if (errno == EEXIST) { + err = 1; + } else { + err = REFTABLE_IO_ERROR; + } + goto done; + } + /* Don't want to write to the lock for now. */ + close(lock_file_fd); + lock_file_fd = 0; + + have_lock = 1; + err = stack_uptodate(st); + if (err != 0) + goto done; + + for (i = first, j = 0; i <= last; i++) { + struct strbuf subtab_file_name = STRBUF_INIT; + struct strbuf subtab_lock = STRBUF_INIT; + int sublock_file_fd = -1; + + stack_filename(&subtab_file_name, st, + reader_name(st->readers[i])); + + strbuf_reset(&subtab_lock); + strbuf_addbuf(&subtab_lock, &subtab_file_name); + strbuf_addstr(&subtab_lock, ".lock"); + + sublock_file_fd = open(subtab_lock.buf, + O_EXCL | O_CREAT | O_WRONLY, 0644); + if (sublock_file_fd > 0) { + close(sublock_file_fd); + } else if (sublock_file_fd < 0) { + if (errno == EEXIST) { + err = 1; + } else { + err = REFTABLE_IO_ERROR; + } + } + + subtable_locks[j] = subtab_lock.buf; + delete_on_success[j] = subtab_file_name.buf; + j++; + + if (err != 0) + goto done; + } + + err = unlink(lock_file_name.buf); + if (err < 0) + goto done; + have_lock = 0; + + err = stack_compact_locked(st, first, last, &temp_tab_file_name, + expiry); + /* Compaction + tombstones can create an empty table out of non-empty + * tables. */ + is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR); + if (is_empty_table) { + err = 0; + } + if (err < 0) + goto done; + + lock_file_fd = + open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644); + if (lock_file_fd < 0) { + if (errno == EEXIST) { + err = 1; + } else { + err = REFTABLE_IO_ERROR; + } + goto done; + } + have_lock = 1; + + format_name(&new_table_name, st->readers[first]->min_update_index, + st->readers[last]->max_update_index); + strbuf_addstr(&new_table_name, ".ref"); + + stack_filename(&new_table_path, st, new_table_name.buf); + + if (!is_empty_table) { + /* retry? */ + err = rename(temp_tab_file_name.buf, new_table_path.buf); + if (err < 0) { + err = REFTABLE_IO_ERROR; + goto done; + } + } + + for (i = 0; i < first; i++) { + strbuf_addstr(&ref_list_contents, st->readers[i]->name); + strbuf_addstr(&ref_list_contents, "\n"); + } + if (!is_empty_table) { + strbuf_addbuf(&ref_list_contents, &new_table_name); + strbuf_addstr(&ref_list_contents, "\n"); + } + for (i = last + 1; i < st->merged->stack_len; i++) { + strbuf_addstr(&ref_list_contents, st->readers[i]->name); + strbuf_addstr(&ref_list_contents, "\n"); + } + + err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len); + if (err < 0) { + err = REFTABLE_IO_ERROR; + unlink(new_table_path.buf); + goto done; + } + err = close(lock_file_fd); + lock_file_fd = 0; + if (err < 0) { + err = REFTABLE_IO_ERROR; + unlink(new_table_path.buf); + goto done; + } + + err = rename(lock_file_name.buf, st->list_file); + if (err < 0) { + err = REFTABLE_IO_ERROR; + unlink(new_table_path.buf); + goto done; + } + have_lock = 0; + + /* Reload the stack before deleting. On windows, we can only delete the + files after we closed them. + */ + err = reftable_stack_reload_maybe_reuse(st, first < last); + + listp = delete_on_success; + while (*listp) { + if (strcmp(*listp, new_table_path.buf)) { + unlink(*listp); + } + listp++; + } + +done: + free_names(delete_on_success); + + listp = subtable_locks; + while (*listp) { + unlink(*listp); + listp++; + } + free_names(subtable_locks); + if (lock_file_fd > 0) { + close(lock_file_fd); + lock_file_fd = 0; + } + if (have_lock) { + unlink(lock_file_name.buf); + } + strbuf_release(&new_table_name); + strbuf_release(&new_table_path); + strbuf_release(&ref_list_contents); + strbuf_release(&temp_tab_file_name); + strbuf_release(&lock_file_name); + return err; +} + +int reftable_stack_compact_all(struct reftable_stack *st, + struct reftable_log_expiry_config *config) +{ + return stack_compact_range(st, 0, st->merged->stack_len - 1, config); +} + +static int stack_compact_range_stats(struct reftable_stack *st, int first, + int last, + struct reftable_log_expiry_config *config) +{ + int err = stack_compact_range(st, first, last, config); + if (err > 0) { + st->stats.failures++; + } + return err; +} + +static int segment_size(struct segment *s) +{ + return s->end - s->start; +} + +int fastlog2(uint64_t sz) +{ + int l = 0; + if (sz == 0) + return 0; + for (; sz; sz /= 2) { + l++; + } + return l - 1; +} + +struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n) +{ + struct segment *segs = reftable_calloc(sizeof(struct segment) * n); + int next = 0; + struct segment cur = { 0 }; + int i = 0; + + if (n == 0) { + *seglen = 0; + return segs; + } + for (i = 0; i < n; i++) { + int log = fastlog2(sizes[i]); + if (cur.log != log && cur.bytes > 0) { + struct segment fresh = { + .start = i, + }; + + segs[next++] = cur; + cur = fresh; + } + + cur.log = log; + cur.end = i + 1; + cur.bytes += sizes[i]; + } + segs[next++] = cur; + *seglen = next; + return segs; +} + +struct segment suggest_compaction_segment(uint64_t *sizes, int n) +{ + int seglen = 0; + struct segment *segs = sizes_to_segments(&seglen, sizes, n); + struct segment min_seg = { + .log = 64, + }; + int i = 0; + for (i = 0; i < seglen; i++) { + if (segment_size(&segs[i]) == 1) { + continue; + } + + if (segs[i].log < min_seg.log) { + min_seg = segs[i]; + } + } + + while (min_seg.start > 0) { + int prev = min_seg.start - 1; + if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) { + break; + } + + min_seg.start = prev; + min_seg.bytes += sizes[prev]; + } + + reftable_free(segs); + return min_seg; +} + +static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) +{ + uint64_t *sizes = + reftable_calloc(sizeof(uint64_t) * st->merged->stack_len); + int version = (st->config.hash_id == GIT_SHA1_FORMAT_ID) ? 1 : 2; + int overhead = header_size(version) - 1; + int i = 0; + for (i = 0; i < st->merged->stack_len; i++) { + sizes[i] = st->readers[i]->size - overhead; + } + return sizes; +} + +int reftable_stack_auto_compact(struct reftable_stack *st) +{ + uint64_t *sizes = stack_table_sizes_for_compaction(st); + struct segment seg = + suggest_compaction_segment(sizes, st->merged->stack_len); + reftable_free(sizes); + if (segment_size(&seg) > 0) + return stack_compact_range_stats(st, seg.start, seg.end - 1, + NULL); + + return 0; +} + +struct reftable_compaction_stats * +reftable_stack_compaction_stats(struct reftable_stack *st) +{ + return &st->stats; +} + +int reftable_stack_read_ref(struct reftable_stack *st, const char *refname, + struct reftable_ref_record *ref) +{ + struct reftable_table tab = { NULL }; + reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st)); + return reftable_table_read_ref(&tab, refname, ref); +} + +int reftable_stack_read_log(struct reftable_stack *st, const char *refname, + struct reftable_log_record *log) +{ + struct reftable_iterator it = { NULL }; + struct reftable_merged_table *mt = reftable_stack_merged_table(st); + int err = reftable_merged_table_seek_log(mt, &it, refname); + if (err) + goto done; + + err = reftable_iterator_next_log(&it, log); + if (err) + goto done; + + if (strcmp(log->refname, refname) || + reftable_log_record_is_deletion(log)) { + err = 1; + goto done; + } + +done: + if (err) { + reftable_log_record_release(log); + } + reftable_iterator_destroy(&it); + return err; +} + +static int stack_check_addition(struct reftable_stack *st, + const char *new_tab_name) +{ + int err = 0; + struct reftable_block_source src = { NULL }; + struct reftable_reader *rd = NULL; + struct reftable_table tab = { NULL }; + struct reftable_ref_record *refs = NULL; + struct reftable_iterator it = { NULL }; + int cap = 0; + int len = 0; + int i = 0; + + if (st->config.skip_name_check) + return 0; + + err = reftable_block_source_from_file(&src, new_tab_name); + if (err < 0) + goto done; + + err = reftable_new_reader(&rd, &src, new_tab_name); + if (err < 0) + goto done; + + err = reftable_reader_seek_ref(rd, &it, ""); + if (err > 0) { + err = 0; + goto done; + } + if (err < 0) + goto done; + + while (1) { + struct reftable_ref_record ref = { NULL }; + err = reftable_iterator_next_ref(&it, &ref); + if (err > 0) { + break; + } + if (err < 0) + goto done; + + if (len >= cap) { + cap = 2 * cap + 1; + refs = reftable_realloc(refs, cap * sizeof(refs[0])); + } + + refs[len++] = ref; + } + + reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st)); + + err = validate_ref_record_addition(tab, refs, len); + +done: + for (i = 0; i < len; i++) { + reftable_ref_record_release(&refs[i]); + } + + free(refs); + reftable_iterator_destroy(&it); + reftable_reader_free(rd); + return err; +} + +static int is_table_name(const char *s) +{ + const char *dot = strrchr(s, '.'); + return dot && !strcmp(dot, ".ref"); +} + +static void remove_maybe_stale_table(struct reftable_stack *st, uint64_t max, + const char *name) +{ + int err = 0; + uint64_t update_idx = 0; + struct reftable_block_source src = { NULL }; + struct reftable_reader *rd = NULL; + struct strbuf table_path = STRBUF_INIT; + stack_filename(&table_path, st, name); + + err = reftable_block_source_from_file(&src, table_path.buf); + if (err < 0) + goto done; + + err = reftable_new_reader(&rd, &src, name); + if (err < 0) + goto done; + + update_idx = reftable_reader_max_update_index(rd); + reftable_reader_free(rd); + + if (update_idx <= max) { + unlink(table_path.buf); + } +done: + strbuf_release(&table_path); +} + +static int reftable_stack_clean_locked(struct reftable_stack *st) +{ + uint64_t max = reftable_merged_table_max_update_index( + reftable_stack_merged_table(st)); + DIR *dir = opendir(st->reftable_dir); + struct dirent *d = NULL; + if (!dir) { + return REFTABLE_IO_ERROR; + } + + while ((d = readdir(dir))) { + int i = 0; + int found = 0; + if (!is_table_name(d->d_name)) + continue; + + for (i = 0; !found && i < st->readers_len; i++) { + found = !strcmp(reader_name(st->readers[i]), d->d_name); + } + if (found) + continue; + + remove_maybe_stale_table(st, max, d->d_name); + } + + closedir(dir); + return 0; +} + +int reftable_stack_clean(struct reftable_stack *st) +{ + struct reftable_addition *add = NULL; + int err = reftable_stack_new_addition(&add, st); + if (err < 0) { + goto done; + } + + err = reftable_stack_reload(st); + if (err < 0) { + goto done; + } + + err = reftable_stack_clean_locked(st); + +done: + reftable_addition_destroy(add); + return err; +} + +int reftable_stack_print_directory(const char *stackdir, uint32_t hash_id) +{ + struct reftable_stack *stack = NULL; + struct reftable_write_options cfg = { .hash_id = hash_id }; + struct reftable_merged_table *merged = NULL; + struct reftable_table table = { NULL }; + + int err = reftable_new_stack(&stack, stackdir, cfg); + if (err < 0) + goto done; + + merged = reftable_stack_merged_table(stack); + reftable_table_from_merged_table(&table, merged); + err = reftable_table_print(&table); +done: + if (stack) + reftable_stack_destroy(stack); + return err; +} diff --git a/reftable/stack.h b/reftable/stack.h new file mode 100644 index 00000000000..f57005846e5 --- /dev/null +++ b/reftable/stack.h @@ -0,0 +1,41 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#ifndef STACK_H +#define STACK_H + +#include "system.h" +#include "reftable-writer.h" +#include "reftable-stack.h" + +struct reftable_stack { + char *list_file; + char *reftable_dir; + int disable_auto_compact; + + struct reftable_write_options config; + + struct reftable_reader **readers; + size_t readers_len; + struct reftable_merged_table *merged; + struct reftable_compaction_stats stats; +}; + +int read_lines(const char *filename, char ***lines); + +struct segment { + int start, end; + int log; + uint64_t bytes; +}; + +int fastlog2(uint64_t sz); +struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n); +struct segment suggest_compaction_segment(uint64_t *sizes, int n); + +#endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c new file mode 100644 index 00000000000..7a4641ab601 --- /dev/null +++ b/reftable/stack_test.c @@ -0,0 +1,949 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "stack.h" + +#include "system.h" + +#include "reftable-reader.h" +#include "merged.h" +#include "basics.h" +#include "constants.h" +#include "record.h" +#include "test_framework.h" +#include "reftable-tests.h" + +#include +#include + +static void clear_dir(const char *dirname) +{ + struct strbuf path = STRBUF_INIT; + strbuf_addstr(&path, dirname); + remove_dir_recursively(&path, 0); + strbuf_release(&path); +} + +static int count_dir_entries(const char *dirname) +{ + DIR *dir = opendir(dirname); + int len = 0; + struct dirent *d; + if (dir == NULL) + return 0; + + while ((d = readdir(dir))) { + if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, ".")) + continue; + len++; + } + closedir(dir); + return len; +} + +// Work linenumber into the tempdir, so we can see which tests forget to +// cleanup. +static char *get_tmp_template(int linenumber) +{ + const char *tmp = getenv("TMPDIR"); + static char template[1024]; + snprintf(template, sizeof(template) - 1, "%s/stack_test-%d.XXXXXX", + tmp ? tmp : "/tmp", linenumber); + return template; +} + +static char *get_tmp_dir(int linenumber) +{ + char *dir = get_tmp_template(linenumber); + EXPECT(mkdtemp(dir)); + return dir; +} + +static void test_read_file(void) +{ + char *fn = get_tmp_template(__LINE__); + int fd = mkstemp(fn); + char out[1024] = "line1\n\nline2\nline3"; + int n, err; + char **names = NULL; + char *want[] = { "line1", "line2", "line3" }; + int i = 0; + + EXPECT(fd > 0); + n = write(fd, out, strlen(out)); + EXPECT(n == strlen(out)); + err = close(fd); + EXPECT(err >= 0); + + err = read_lines(fn, &names); + EXPECT_ERR(err); + + for (i = 0; names[i]; i++) { + EXPECT(0 == strcmp(want[i], names[i])); + } + free_names(names); + remove(fn); +} + +static void test_parse_names(void) +{ + char buf[] = "line\n"; + char **names = NULL; + parse_names(buf, strlen(buf), &names); + + EXPECT(NULL != names[0]); + EXPECT(0 == strcmp(names[0], "line")); + EXPECT(NULL == names[1]); + free_names(names); +} + +static void test_names_equal(void) +{ + char *a[] = { "a", "b", "c", NULL }; + char *b[] = { "a", "b", "d", NULL }; + char *c[] = { "a", "b", NULL }; + + EXPECT(names_equal(a, a)); + EXPECT(!names_equal(a, b)); + EXPECT(!names_equal(a, c)); +} + +static int write_test_ref(struct reftable_writer *wr, void *arg) +{ + struct reftable_ref_record *ref = arg; + reftable_writer_set_limits(wr, ref->update_index, ref->update_index); + return reftable_writer_add_ref(wr, ref); +} + +struct write_log_arg { + struct reftable_log_record *log; + uint64_t update_index; +}; + +static int write_test_log(struct reftable_writer *wr, void *arg) +{ + struct write_log_arg *wla = arg; + + reftable_writer_set_limits(wr, wla->update_index, wla->update_index); + return reftable_writer_add_log(wr, wla->log); +} + +static void test_reftable_stack_add_one(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + struct reftable_ref_record ref = { + .refname = "HEAD", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + struct reftable_ref_record dest = { NULL }; + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_test_ref, &ref); + EXPECT_ERR(err); + + err = reftable_stack_read_ref(st, ref.refname, &dest); + EXPECT_ERR(err); + EXPECT(0 == strcmp("master", dest.value.symref)); + + printf("testing print functionality:\n"); + err = reftable_stack_print_directory(dir, GIT_SHA1_FORMAT_ID); + EXPECT_ERR(err); + + err = reftable_stack_print_directory(dir, GIT_SHA256_FORMAT_ID); + EXPECT(err == REFTABLE_FORMAT_ERROR); + + reftable_ref_record_release(&dest); + reftable_stack_destroy(st); + clear_dir(dir); +} + +static void test_reftable_stack_uptodate(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st1 = NULL; + struct reftable_stack *st2 = NULL; + char *dir = get_tmp_dir(__LINE__); + + int err; + struct reftable_ref_record ref1 = { + .refname = "HEAD", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + struct reftable_ref_record ref2 = { + .refname = "branch2", + .update_index = 2, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + + + /* simulate multi-process access to the same stack + by creating two stacks for the same directory. + */ + err = reftable_new_stack(&st1, dir, cfg); + EXPECT_ERR(err); + + err = reftable_new_stack(&st2, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st1, &write_test_ref, &ref1); + EXPECT_ERR(err); + + err = reftable_stack_add(st2, &write_test_ref, &ref2); + EXPECT(err == REFTABLE_LOCK_ERROR); + + err = reftable_stack_reload(st2); + EXPECT_ERR(err); + + err = reftable_stack_add(st2, &write_test_ref, &ref2); + EXPECT_ERR(err); + reftable_stack_destroy(st1); + reftable_stack_destroy(st2); + clear_dir(dir); +} + +static void test_reftable_stack_transaction_api(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + struct reftable_addition *add = NULL; + + struct reftable_ref_record ref = { + .refname = "HEAD", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + struct reftable_ref_record dest = { NULL }; + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + reftable_addition_destroy(add); + + err = reftable_stack_new_addition(&add, st); + EXPECT_ERR(err); + + err = reftable_addition_add(add, &write_test_ref, &ref); + EXPECT_ERR(err); + + err = reftable_addition_commit(add); + EXPECT_ERR(err); + + reftable_addition_destroy(add); + + err = reftable_stack_read_ref(st, ref.refname, &dest); + EXPECT_ERR(err); + EXPECT(REFTABLE_REF_SYMREF == dest.value_type); + EXPECT(0 == strcmp("master", dest.value.symref)); + + reftable_ref_record_release(&dest); + reftable_stack_destroy(st); + clear_dir(dir); +} + +static void test_reftable_stack_validate_refname(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + char *dir = get_tmp_dir(__LINE__); + + int i; + struct reftable_ref_record ref = { + .refname = "a/b", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + char *additions[] = { "a", "a/b/c" }; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_test_ref, &ref); + EXPECT_ERR(err); + + for (i = 0; i < ARRAY_SIZE(additions); i++) { + struct reftable_ref_record ref = { + .refname = additions[i], + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + + err = reftable_stack_add(st, &write_test_ref, &ref); + EXPECT(err == REFTABLE_NAME_CONFLICT); + } + + reftable_stack_destroy(st); + clear_dir(dir); +} + +static int write_error(struct reftable_writer *wr, void *arg) +{ + return *((int *)arg); +} + +static void test_reftable_stack_update_index_check(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + struct reftable_ref_record ref1 = { + .refname = "name1", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + struct reftable_ref_record ref2 = { + .refname = "name2", + .update_index = 1, + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_test_ref, &ref1); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_test_ref, &ref2); + EXPECT(err == REFTABLE_API_ERROR); + reftable_stack_destroy(st); + clear_dir(dir); +} + +static void test_reftable_stack_lock_failure(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err, i; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) { + err = reftable_stack_add(st, &write_error, &i); + EXPECT(err == i); + } + + reftable_stack_destroy(st); + clear_dir(dir); +} + +static void test_reftable_stack_add(void) +{ + int i = 0; + int err = 0; + struct reftable_write_options cfg = { + .exact_log_message = 1, + }; + struct reftable_stack *st = NULL; + char *dir = get_tmp_dir(__LINE__); + + struct reftable_ref_record refs[2] = { { NULL } }; + struct reftable_log_record logs[2] = { { NULL } }; + int N = ARRAY_SIZE(refs); + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + st->disable_auto_compact = 1; + + for (i = 0; i < N; i++) { + char buf[256]; + snprintf(buf, sizeof(buf), "branch%02d", i); + refs[i].refname = xstrdup(buf); + refs[i].update_index = i + 1; + refs[i].value_type = REFTABLE_REF_VAL1; + refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); + set_test_hash(refs[i].value.val1, i); + + logs[i].refname = xstrdup(buf); + logs[i].update_index = N + i + 1; + logs[i].value_type = REFTABLE_LOG_UPDATE; + + logs[i].value.update.new_hash = reftable_malloc(GIT_SHA1_RAWSZ); + logs[i].value.update.email = xstrdup("identity@invalid"); + set_test_hash(logs[i].value.update.new_hash, i); + } + + for (i = 0; i < N; i++) { + int err = reftable_stack_add(st, &write_test_ref, &refs[i]); + EXPECT_ERR(err); + } + + for (i = 0; i < N; i++) { + struct write_log_arg arg = { + .log = &logs[i], + .update_index = reftable_stack_next_update_index(st), + }; + int err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT_ERR(err); + } + + err = reftable_stack_compact_all(st, NULL); + EXPECT_ERR(err); + + for (i = 0; i < N; i++) { + struct reftable_ref_record dest = { NULL }; + + int err = reftable_stack_read_ref(st, refs[i].refname, &dest); + EXPECT_ERR(err); + EXPECT(reftable_ref_record_equal(&dest, refs + i, + GIT_SHA1_RAWSZ)); + reftable_ref_record_release(&dest); + } + + for (i = 0; i < N; i++) { + struct reftable_log_record dest = { NULL }; + int err = reftable_stack_read_log(st, refs[i].refname, &dest); + EXPECT_ERR(err); + EXPECT(reftable_log_record_equal(&dest, logs + i, + GIT_SHA1_RAWSZ)); + reftable_log_record_release(&dest); + } + + /* cleanup */ + reftable_stack_destroy(st); + for (i = 0; i < N; i++) { + reftable_ref_record_release(&refs[i]); + reftable_log_record_release(&logs[i]); + } + clear_dir(dir); +} + +static void test_reftable_stack_log_normalize(void) +{ + int err = 0; + struct reftable_write_options cfg = { + 0, + }; + struct reftable_stack *st = NULL; + char *dir = get_tmp_dir(__LINE__); + + uint8_t h1[GIT_SHA1_RAWSZ] = { 0x01 }, h2[GIT_SHA1_RAWSZ] = { 0x02 }; + + struct reftable_log_record input = { .refname = "branch", + .update_index = 1, + .value_type = REFTABLE_LOG_UPDATE, + .value = { .update = { + .new_hash = h1, + .old_hash = h2, + } } }; + struct reftable_log_record dest = { + .update_index = 0, + }; + struct write_log_arg arg = { + .log = &input, + .update_index = 1, + }; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + input.value.update.message = "one\ntwo"; + err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT(err == REFTABLE_API_ERROR); + + input.value.update.message = "one"; + err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT_ERR(err); + + err = reftable_stack_read_log(st, input.refname, &dest); + EXPECT_ERR(err); + EXPECT(0 == strcmp(dest.value.update.message, "one\n")); + + input.value.update.message = "two\n"; + arg.update_index = 2; + err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT_ERR(err); + err = reftable_stack_read_log(st, input.refname, &dest); + EXPECT_ERR(err); + EXPECT(0 == strcmp(dest.value.update.message, "two\n")); + + /* cleanup */ + reftable_stack_destroy(st); + reftable_log_record_release(&dest); + clear_dir(dir); +} + +static void test_reftable_stack_tombstone(void) +{ + int i = 0; + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + struct reftable_ref_record refs[2] = { { NULL } }; + struct reftable_log_record logs[2] = { { NULL } }; + int N = ARRAY_SIZE(refs); + struct reftable_ref_record dest = { NULL }; + struct reftable_log_record log_dest = { NULL }; + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + /* even entries add the refs, odd entries delete them. */ + for (i = 0; i < N; i++) { + const char *buf = "branch"; + refs[i].refname = xstrdup(buf); + refs[i].update_index = i + 1; + if (i % 2 == 0) { + refs[i].value_type = REFTABLE_REF_VAL1; + refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ); + set_test_hash(refs[i].value.val1, i); + } + + logs[i].refname = xstrdup(buf); + /* update_index is part of the key. */ + logs[i].update_index = 42; + if (i % 2 == 0) { + logs[i].value_type = REFTABLE_LOG_UPDATE; + logs[i].value.update.new_hash = + reftable_malloc(GIT_SHA1_RAWSZ); + set_test_hash(logs[i].value.update.new_hash, i); + logs[i].value.update.email = + xstrdup("identity@invalid"); + } + } + for (i = 0; i < N; i++) { + int err = reftable_stack_add(st, &write_test_ref, &refs[i]); + EXPECT_ERR(err); + } + + for (i = 0; i < N; i++) { + struct write_log_arg arg = { + .log = &logs[i], + .update_index = reftable_stack_next_update_index(st), + }; + int err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT_ERR(err); + } + + err = reftable_stack_read_ref(st, "branch", &dest); + EXPECT(err == 1); + reftable_ref_record_release(&dest); + + err = reftable_stack_read_log(st, "branch", &log_dest); + EXPECT(err == 1); + reftable_log_record_release(&log_dest); + + err = reftable_stack_compact_all(st, NULL); + EXPECT_ERR(err); + + err = reftable_stack_read_ref(st, "branch", &dest); + EXPECT(err == 1); + + err = reftable_stack_read_log(st, "branch", &log_dest); + EXPECT(err == 1); + reftable_ref_record_release(&dest); + reftable_log_record_release(&log_dest); + + /* cleanup */ + reftable_stack_destroy(st); + for (i = 0; i < N; i++) { + reftable_ref_record_release(&refs[i]); + reftable_log_record_release(&logs[i]); + } + clear_dir(dir); +} + +static void test_reftable_stack_hash_id(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + + struct reftable_ref_record ref = { + .refname = "master", + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "target", + .update_index = 1, + }; + struct reftable_write_options cfg32 = { .hash_id = GIT_SHA256_FORMAT_ID }; + struct reftable_stack *st32 = NULL; + struct reftable_write_options cfg_default = { 0 }; + struct reftable_stack *st_default = NULL; + struct reftable_ref_record dest = { NULL }; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_test_ref, &ref); + EXPECT_ERR(err); + + /* can't read it with the wrong hash ID. */ + err = reftable_new_stack(&st32, dir, cfg32); + EXPECT(err == REFTABLE_FORMAT_ERROR); + + /* check that we can read it back with default config too. */ + err = reftable_new_stack(&st_default, dir, cfg_default); + EXPECT_ERR(err); + + err = reftable_stack_read_ref(st_default, "master", &dest); + EXPECT_ERR(err); + + EXPECT(reftable_ref_record_equal(&ref, &dest, GIT_SHA1_RAWSZ)); + reftable_ref_record_release(&dest); + reftable_stack_destroy(st); + reftable_stack_destroy(st_default); + clear_dir(dir); +} + +static void test_log2(void) +{ + EXPECT(1 == fastlog2(3)); + EXPECT(2 == fastlog2(4)); + EXPECT(2 == fastlog2(5)); +} + +static void test_sizes_to_segments(void) +{ + uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; + /* .................0 1 2 3 4 5 */ + + int seglen = 0; + struct segment *segs = + sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); + EXPECT(segs[2].log == 3); + EXPECT(segs[2].start == 5); + EXPECT(segs[2].end == 6); + + EXPECT(segs[1].log == 2); + EXPECT(segs[1].start == 2); + EXPECT(segs[1].end == 5); + reftable_free(segs); +} + +static void test_sizes_to_segments_empty(void) +{ + int seglen = 0; + struct segment *segs = sizes_to_segments(&seglen, NULL, 0); + EXPECT(seglen == 0); + reftable_free(segs); +} + +static void test_sizes_to_segments_all_equal(void) +{ + uint64_t sizes[] = { 5, 5 }; + + int seglen = 0; + struct segment *segs = + sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); + EXPECT(seglen == 1); + EXPECT(segs[0].start == 0); + EXPECT(segs[0].end == 2); + reftable_free(segs); +} + +static void test_suggest_compaction_segment(void) +{ + uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; + /* .................0 1 2 3 4 5 6 */ + struct segment min = + suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); + EXPECT(min.start == 2); + EXPECT(min.end == 7); +} + +static void test_suggest_compaction_segment_nothing(void) +{ + uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 }; + struct segment result = + suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); + EXPECT(result.start == result.end); +} + +static void test_reflog_expire(void) +{ + char *dir = get_tmp_dir(__LINE__); + + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + struct reftable_log_record logs[20] = { { NULL } }; + int N = ARRAY_SIZE(logs) - 1; + int i = 0; + int err; + struct reftable_log_expiry_config expiry = { + .time = 10, + }; + struct reftable_log_record log = { NULL }; + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + for (i = 1; i <= N; i++) { + char buf[256]; + snprintf(buf, sizeof(buf), "branch%02d", i); + + logs[i].refname = xstrdup(buf); + logs[i].update_index = i; + logs[i].value_type = REFTABLE_LOG_UPDATE; + logs[i].value.update.time = i; + logs[i].value.update.new_hash = reftable_malloc(GIT_SHA1_RAWSZ); + logs[i].value.update.email = xstrdup("identity@invalid"); + set_test_hash(logs[i].value.update.new_hash, i); + } + + for (i = 1; i <= N; i++) { + struct write_log_arg arg = { + .log = &logs[i], + .update_index = reftable_stack_next_update_index(st), + }; + int err = reftable_stack_add(st, &write_test_log, &arg); + EXPECT_ERR(err); + } + + err = reftable_stack_compact_all(st, NULL); + EXPECT_ERR(err); + + err = reftable_stack_compact_all(st, &expiry); + EXPECT_ERR(err); + + err = reftable_stack_read_log(st, logs[9].refname, &log); + EXPECT(err == 1); + + err = reftable_stack_read_log(st, logs[11].refname, &log); + EXPECT_ERR(err); + + expiry.min_update_index = 15; + err = reftable_stack_compact_all(st, &expiry); + EXPECT_ERR(err); + + err = reftable_stack_read_log(st, logs[14].refname, &log); + EXPECT(err == 1); + + err = reftable_stack_read_log(st, logs[16].refname, &log); + EXPECT_ERR(err); + + /* cleanup */ + reftable_stack_destroy(st); + for (i = 0; i <= N; i++) { + reftable_log_record_release(&logs[i]); + } + clear_dir(dir); + reftable_log_record_release(&log); +} + +static int write_nothing(struct reftable_writer *wr, void *arg) +{ + reftable_writer_set_limits(wr, 1, 1); + return 0; +} + +static void test_empty_add(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + int err; + char *dir = get_tmp_dir(__LINE__); + + struct reftable_stack *st2 = NULL; + + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_add(st, &write_nothing, NULL); + EXPECT_ERR(err); + + err = reftable_new_stack(&st2, dir, cfg); + EXPECT_ERR(err); + clear_dir(dir); + reftable_stack_destroy(st); + reftable_stack_destroy(st2); +} + +static void test_reftable_stack_auto_compaction(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st = NULL; + char *dir = get_tmp_dir(__LINE__); + + int err, i; + int N = 100; + + err = reftable_new_stack(&st, dir, cfg); + EXPECT_ERR(err); + + for (i = 0; i < N; i++) { + char name[100]; + struct reftable_ref_record ref = { + .refname = name, + .update_index = reftable_stack_next_update_index(st), + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + snprintf(name, sizeof(name), "branch%04d", i); + + err = reftable_stack_add(st, &write_test_ref, &ref); + EXPECT_ERR(err); + + EXPECT(i < 3 || st->merged->stack_len < 2 * fastlog2(i)); + } + + EXPECT(reftable_stack_compaction_stats(st)->entries_written < + (uint64_t)(N * fastlog2(N))); + + reftable_stack_destroy(st); + clear_dir(dir); +} + +static void test_reftable_stack_compaction_concurrent(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st1 = NULL, *st2 = NULL; + char *dir = get_tmp_dir(__LINE__); + + int err, i; + int N = 3; + + err = reftable_new_stack(&st1, dir, cfg); + EXPECT_ERR(err); + + for (i = 0; i < N; i++) { + char name[100]; + struct reftable_ref_record ref = { + .refname = name, + .update_index = reftable_stack_next_update_index(st1), + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + snprintf(name, sizeof(name), "branch%04d", i); + + err = reftable_stack_add(st1, &write_test_ref, &ref); + EXPECT_ERR(err); + } + + err = reftable_new_stack(&st2, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_compact_all(st1, NULL); + EXPECT_ERR(err); + + reftable_stack_destroy(st1); + reftable_stack_destroy(st2); + + EXPECT(count_dir_entries(dir) == 2); + clear_dir(dir); +} + +static void unclean_stack_close(struct reftable_stack *st) +{ + // break abstraction boundary to simulate unclean shutdown. + int i = 0; + for (; i < st->readers_len; i++) { + reftable_reader_free(st->readers[i]); + } + st->readers_len = 0; + FREE_AND_NULL(st->readers); +} + +static void test_reftable_stack_compaction_concurrent_clean(void) +{ + struct reftable_write_options cfg = { 0 }; + struct reftable_stack *st1 = NULL, *st2 = NULL, *st3 = NULL; + char *dir = get_tmp_dir(__LINE__); + + int err, i; + int N = 3; + + err = reftable_new_stack(&st1, dir, cfg); + EXPECT_ERR(err); + + for (i = 0; i < N; i++) { + char name[100]; + struct reftable_ref_record ref = { + .refname = name, + .update_index = reftable_stack_next_update_index(st1), + .value_type = REFTABLE_REF_SYMREF, + .value.symref = "master", + }; + snprintf(name, sizeof(name), "branch%04d", i); + + err = reftable_stack_add(st1, &write_test_ref, &ref); + EXPECT_ERR(err); + } + + err = reftable_new_stack(&st2, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_compact_all(st1, NULL); + EXPECT_ERR(err); + + unclean_stack_close(st1); + unclean_stack_close(st2); + + err = reftable_new_stack(&st3, dir, cfg); + EXPECT_ERR(err); + + err = reftable_stack_clean(st3); + EXPECT_ERR(err); + EXPECT(count_dir_entries(dir) == 2); + + reftable_stack_destroy(st1); + reftable_stack_destroy(st2); + reftable_stack_destroy(st3); + + clear_dir(dir); +} + +int stack_test_main(int argc, const char *argv[]) +{ + RUN_TEST(test_empty_add); + RUN_TEST(test_log2); + RUN_TEST(test_names_equal); + RUN_TEST(test_parse_names); + RUN_TEST(test_read_file); + RUN_TEST(test_reflog_expire); + RUN_TEST(test_reftable_stack_add); + RUN_TEST(test_reftable_stack_add_one); + RUN_TEST(test_reftable_stack_auto_compaction); + RUN_TEST(test_reftable_stack_compaction_concurrent); + RUN_TEST(test_reftable_stack_compaction_concurrent_clean); + RUN_TEST(test_reftable_stack_hash_id); + RUN_TEST(test_reftable_stack_lock_failure); + RUN_TEST(test_reftable_stack_log_normalize); + RUN_TEST(test_reftable_stack_tombstone); + RUN_TEST(test_reftable_stack_transaction_api); + RUN_TEST(test_reftable_stack_update_index_check); + RUN_TEST(test_reftable_stack_uptodate); + RUN_TEST(test_reftable_stack_validate_refname); + RUN_TEST(test_sizes_to_segments); + RUN_TEST(test_sizes_to_segments_all_equal); + RUN_TEST(test_sizes_to_segments_empty); + RUN_TEST(test_suggest_compaction_segment); + RUN_TEST(test_suggest_compaction_segment_nothing); + return 0; +} diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index c8db6852c35..996da85f7b5 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -10,6 +10,7 @@ int cmd__reftable(int argc, const char **argv) record_test_main(argc, argv); refname_test_main(argc, argv); readwrite_test_main(argc, argv); + stack_test_main(argc, argv); tree_test_main(argc, argv); return 0; } From patchwork Thu Sep 9 18:47:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B9F3C433F5 for ; Thu, 9 Sep 2021 18:49:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81E9461100 for ; Thu, 9 Sep 2021 18:49:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245618AbhIISuf (ORCPT ); Thu, 9 Sep 2021 14:50:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245603AbhIIStO (ORCPT ); Thu, 9 Sep 2021 14:49:14 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6BAFC0617A8 for ; Thu, 9 Sep 2021 11:47:58 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id i23so30126wrb.2 for ; Thu, 09 Sep 2021 11:47:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=vRAQxRV4mWm5KzO2JMO1K4ORWwbDDHcdGl0emEa8Wlg=; b=dBkjwz5++gXyzo6TGyDlHq3lGOfDnYTxO/XvEx1SYZPc1ShvOumSJZEqc5UN6dWzOv WlYd1MDFCwgkNv+BCfO95dKPiYVVMEBhz/d1Lj0BJuFKkkv7qR6KhPoX8oHG8vfE/UpG KOXA9nuy34f1pAPPhqGOBw3nZfx7Jz2sGUxZsLLRu5EHmvv7BieoPb6Oj3Au3rMYkThH Ge/AobpfMkvAxSk5LslzElUH9TlGLVnXqKuLIYwNRDoo/OaR6/3r7LMvRzkekaL2h1vQ +5NpvdRElrclhoUSXlaPUVmpSJ6LCY9vxNLpl2dXe+1a3ZGYkVurH18K4clFb8GJWKR5 flEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:mime-version:content-transfer-encoding:fcc:to:cc; bh=vRAQxRV4mWm5KzO2JMO1K4ORWwbDDHcdGl0emEa8Wlg=; b=7MLGr//c5pyOtoWCNiIfPYakbbWouDsVSox86YY10VuL81yfM7XPZcRvq7LYDamNp0 FQ9AsUGg3zBmk3HZBH1nueMldErC8HjNBXRTj54AdtrduCVuTTCWyIMKBHMr6jJz0fFr zj4bOnD3dyPAPsFdwHNW9JqsMAyVlKNUI2KnaEhQ49F5W6Xf2FSPz3fxISyLVFkYeVnX dSO5Xg5eJyiXBd+rn72L4B9D3reivz6W5ugTIn2VPit08qtR125bxDZ4IuXFvoSG4+kp UAPWBtJota8vwYE1etgO6ggUrugPkXkS9AnFvM3smkcdAGK2gJJjjKqeRkJ1sFMzpM3q PKrg== X-Gm-Message-State: AOAM532gC9Ar0CDAsX6WYnauWGsHy3IzgFrWBVyguaBwgHxDoKn/P6PM JIo+CHVwzWniv6dCs/UyayoqaqfstOk= X-Google-Smtp-Source: ABdhPJzwhI7NbzWUecrgz6iN/sXp45Kk6f04IbRsKuyyes56IFiVHDEohSn1qhJgGa3VN0wKsiFw6Q== X-Received: by 2002:adf:8170:: with SMTP id 103mr5492002wrm.167.1631213277560; Thu, 09 Sep 2021 11:47:57 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b24sm2200774wmj.43.2021.09.09.11.47.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:57 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:43 +0000 Subject: [PATCH v2 18/19] reftable: add dump utility MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys provide a command-line utility for inspecting individual tables, and inspecting a complete ref database Signed-off-by: Han-Wen Nienhuys Helped-by: Carlo Marcelo Arenas Belón --- reftable/dump.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 reftable/dump.c diff --git a/reftable/dump.c b/reftable/dump.c new file mode 100644 index 00000000000..155953d1b82 --- /dev/null +++ b/reftable/dump.c @@ -0,0 +1,107 @@ +/* +Copyright 2020 Google LLC + +Use of this source code is governed by a BSD-style +license that can be found in the LICENSE file or at +https://developers.google.com/open-source/licenses/bsd +*/ + +#include "git-compat-util.h" +#include "hash.h" + +#include "reftable-blocksource.h" +#include "reftable-error.h" +#include "reftable-merged.h" +#include "reftable-record.h" +#include "reftable-tests.h" +#include "reftable-writer.h" +#include "reftable-iterator.h" +#include "reftable-reader.h" +#include "reftable-stack.h" +#include "reftable-generic.h" + +#include +#include +#include +#include +#include + +static int compact_stack(const char *stackdir) +{ + struct reftable_stack *stack = NULL; + struct reftable_write_options cfg = { 0 }; + + int err = reftable_new_stack(&stack, stackdir, cfg); + if (err < 0) + goto done; + + err = reftable_stack_compact_all(stack, NULL); + if (err < 0) + goto done; +done: + if (stack) { + reftable_stack_destroy(stack); + } + return err; +} + +static void print_help(void) +{ + printf("usage: dump [-cst] arg\n\n" + "options: \n" + " -c compact\n" + " -t dump table\n" + " -s dump stack\n" + " -6 sha256 hash format\n" + " -h this help\n" + "\n"); +} + +int reftable_dump_main(int argc, char *const *argv) +{ + int err = 0; + int opt_dump_table = 0; + int opt_dump_stack = 0; + int opt_compact = 0; + uint32_t opt_hash_id = GIT_SHA1_FORMAT_ID; + const char *arg = NULL, *argv0 = argv[0]; + + for (; argc > 1; argv++, argc--) + if (*argv[1] != '-') + break; + else if (!strcmp("-t", argv[1])) + opt_dump_table = 1; + else if (!strcmp("-6", argv[1])) + opt_hash_id = GIT_SHA256_FORMAT_ID; + else if (!strcmp("-s", argv[1])) + opt_dump_stack = 1; + else if (!strcmp("-c", argv[1])) + opt_compact = 1; + else if (!strcmp("-?", argv[1]) || !strcmp("-h", argv[1])) { + print_help(); + return 2; + } + + if (argc != 2) { + fprintf(stderr, "need argument\n"); + print_help(); + return 2; + } + + arg = argv[1]; + + if (opt_dump_table) { + err = reftable_reader_print_file(arg); + } else if (opt_dump_stack) { + err = reftable_stack_print_directory(arg, opt_hash_id); + } else if (opt_compact) { + err = compact_stack(arg); + } + + if (err < 0) { + fprintf(stderr, "%s: %s: %s\n", argv0, arg, + reftable_error_str(err)); + return 1; + } + return 0; +} From patchwork Thu Sep 9 18:47:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han-Wen Nienhuys X-Patchwork-Id: 12483693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDE9C433F5 for ; Thu, 9 Sep 2021 18:49:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62B8D61100 for ; Thu, 9 Sep 2021 18:49:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239045AbhIISuj (ORCPT ); Thu, 9 Sep 2021 14:50:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245645AbhIIStP (ORCPT ); Thu, 9 Sep 2021 14:49:15 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BA8AC06175F for ; Thu, 9 Sep 2021 11:47:59 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id q11so3939455wrr.9 for ; Thu, 09 Sep 2021 11:47:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=vy4P0I34I90yBXSPW4HKvPnrTIWXoEj7lGW0PFnr3ZQ=; b=B8wmPhbqicsngZIkHwOMrxheZS4USFz2VjKv5W5EQ3pDCH20ymYANTQDzZBEcSwWHL 3aAz++ZgMRTHT3+FvlqP1Z8dQhzIsXjlYsk1ZKVLDPyiTfT4uU/8kvfyEk8hdR38Z1Ew 9eRWMT0qOqKgrmdpM9s9JEjDYqR8RJjszKwz5QEK3a3F00hyGu214OVhu67Tnql22Xjq Ve/YJtjCX1wslAo26XUld7D28dBISNHrE5hak1w5jZukxMIB0Jq4mCUyria1nRAIuhNd yJbtLHPLElpd5UbzQBHM2kF6J6fK+CnTY/8MbWWvPnavLgtEyPKMJbu8c30TtfhpwYAE bicA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=vy4P0I34I90yBXSPW4HKvPnrTIWXoEj7lGW0PFnr3ZQ=; b=GB0gObLe56b1MJZ3YAqKn5Vfco7fnuV/rYn0TnmfAM8UIg0e94PacrdKNaAZfq6NBq 1lbGQKKxQAuSGiw9rj0h/XRTPlKbXJOT5AAihF3s+9gh3z4Ulq0cyxtX1bbQKVsjC1+e w3tGhmtM3mL9m7tq3WRnS9N1FMLiqnj5PABVAnNdK6kvoIyH6ksEMuYmOUSa7FjcfB9T +COnupRpuqo7UGMZKxtx6utYEg5fDPujCW+eHzW8Fox4+izpxHG6bAiNE5YiRSXO8tNJ MerERaRd3117GEDHuFkMqHZZ7ey/B88pL1iis702nWLyjDaonMVgSlerM/+bjgGIS/ng umNg== X-Gm-Message-State: AOAM531fU4NX8YTwvceQUjnkJQ+KUOFtXaJHXlLrWvduChc8TQJVmLUh A7nySIqTDXCtarrtCohlM1zB/ozuiwY= X-Google-Smtp-Source: ABdhPJzEnn0qI/oJbijcIXm5GzkCEBeeqP3ycZ8VdzcmDa1XPFhEk1s3bjIaTTNJ9TQoZgWpFYL2oQ== X-Received: by 2002:a5d:5241:: with SMTP id k1mr5460798wrc.14.1631213278116; Thu, 09 Sep 2021 11:47:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t18sm2372907wrp.97.2021.09.09.11.47.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Sep 2021 11:47:57 -0700 (PDT) Message-Id: <104fbc7502ffe3258d74bef240c7e2f5d91219b4.1631213265.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 09 Sep 2021 18:47:44 +0000 Subject: [PATCH v2 19/19] Add "test-tool dump-reftable" command. Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Carlo Marcelo Arenas =?utf-8?b?QmVsw7Nu?= , Han-Wen Nienhuys , Han-Wen Nienhuys Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Han-Wen Nienhuys From: Han-Wen Nienhuys This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys --- Makefile | 1 + t/helper/test-reftable.c | 5 +++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + 4 files changed, 8 insertions(+) diff --git a/Makefile b/Makefile index 619b1b15ce0..8ec68a70405 100644 --- a/Makefile +++ b/Makefile @@ -2474,6 +2474,7 @@ REFTABLE_OBJS += reftable/writer.o REFTABLE_TEST_OBJS += reftable/basics_test.o REFTABLE_TEST_OBJS += reftable/block_test.o +REFTABLE_TEST_OBJS += reftable/dump.o REFTABLE_TEST_OBJS += reftable/merged_test.o REFTABLE_TEST_OBJS += reftable/pq_test.o REFTABLE_TEST_OBJS += reftable/record_test.o diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c index 996da85f7b5..26b03d7b789 100644 --- a/t/helper/test-reftable.c +++ b/t/helper/test-reftable.c @@ -14,3 +14,8 @@ int cmd__reftable(int argc, const char **argv) tree_test_main(argc, argv); return 0; } + +int cmd__dump_reftable(int argc, const char **argv) +{ + return reftable_dump_main(argc, (char *const *)argv); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index f7c888ffda7..338a57b104d 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -61,6 +61,7 @@ static struct test_cmd cmds[] = { { "read-midx", cmd__read_midx }, { "ref-store", cmd__ref_store }, { "reftable", cmd__reftable }, + { "dump-reftable", cmd__dump_reftable }, { "regex", cmd__regex }, { "repository", cmd__repository }, { "revision-walking", cmd__revision_walking }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 25f77469146..48cee1f4a2d 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -19,6 +19,7 @@ int cmd__dump_cache_tree(int argc, const char **argv); int cmd__dump_fsmonitor(int argc, const char **argv); int cmd__dump_split_index(int argc, const char **argv); int cmd__dump_untracked_cache(int argc, const char **argv); +int cmd__dump_reftable(int argc, const char **argv); int cmd__example_decorate(int argc, const char **argv); int cmd__fast_rebase(int argc, const char **argv); int cmd__genrandom(int argc, const char **argv);