From patchwork Fri Jun 7 01:07:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Emily Shaffer X-Patchwork-Id: 10980625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0FCBD92A for ; Fri, 7 Jun 2019 01:08:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E38EA28AC4 for ; Fri, 7 Jun 2019 01:08:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D365828ADD; Fri, 7 Jun 2019 01:08:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCBE628AC4 for ; Fri, 7 Jun 2019 01:08:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726923AbfFGBIK (ORCPT ); Thu, 6 Jun 2019 21:08:10 -0400 Received: from mail-qk1-f202.google.com ([209.85.222.202]:43815 "EHLO mail-qk1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725784AbfFGBIJ (ORCPT ); Thu, 6 Jun 2019 21:08:09 -0400 Received: by mail-qk1-f202.google.com with SMTP id v4so251791qkj.10 for ; Thu, 06 Jun 2019 18:08:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=eLaFWEdtWUywy7PHtRHKLs8UG0u/K1YUwFCMGOVqc9A=; b=oiEt4mfyAmkdoIsj1czqhQJxc11A5yov5rzkYBmAatetTURi0ik5v/LCZ8G9ocCQO0 lvFv1Njo5IwuBR31z5Ab+LcGaz1+fkPKBh2vsZLqPDHY700B04nzd0OHQ4qIhL2QaJjA Gel1kuaWMAHNvIlFoOefpan0+xU3fEwOVhsWWTyl0fqqC5B4/s5gssHi7aeYWq0zyCs1 lr4FciuiMJDPPBel1Qby0bB2ZbMsdIqRqyd2dazWpOKaNKcwV3WKqsaQGJz0kV47/G90 RKzdeefF00IFJV2KWbjlud8Udl9WJbam1SJE7q5vsLBEs47NM1Qfz2qJbNZY2g7J8w/o fWkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=eLaFWEdtWUywy7PHtRHKLs8UG0u/K1YUwFCMGOVqc9A=; b=tf0oJWw4i7XWRO6RL+ADTAOmEtIDvJYwV/5zaX2l9eNKf7sDDeEkdOucGnLIfI0p9G fTY0XE9wk+ExVmiKRQGbnWnvEa9n4K9P6k6XdcN7b0F3AxL/YBdqGcdQqQaBJui1yzc5 IvcRkHLWd9+EtOkFSwzFh5nPDtTnV37+pORR9zUSb/0dEdFneBDpD7wVMG/7wTtY3Uwx cnvA/P2t4B6vmWFsYlTuKM8VN5ja5mhdCuT2BE0TQkVkZMfAjzQU/1b9kt6904emE9Yh d9m63s0/CeLybbKVoQGv3WWz+AP+uj/uI5hh6Uae4Bd6Uza7YWhAB2dAv4co4Qtt5bh6 81ow== X-Gm-Message-State: APjAAAWols9msF8y+cODSa2jcnWNBdj+GS7LmZno2FjVjEMXHE792z2e BiyKxNClKIk1YvdSsQeqhAE0krNQcAS7/isDpsEu5muZbbPBx9TKWdP4RK60ah2gaU20y+HudGu E959JZYwAhNS/4Dn+vf6KWxlX4Mje2YPBsGmXt76bdJjnMyUAHSTu0Ft3SpnefGqLlAJUV5jPqw == X-Google-Smtp-Source: APXvYqwNq8iia+mETnpE1fFAiJNhBnQ1RAUJ/LCktdL8bopnmitdLH4iJsjzIyEM04AhPa7n4kue+Z0ztemdUj8ElOY= X-Received: by 2002:a0c:c164:: with SMTP id i33mr23226606qvh.37.1559869688274; Thu, 06 Jun 2019 18:08:08 -0700 (PDT) Date: Thu, 6 Jun 2019 18:07:08 -0700 Message-Id: <20190607010708.46654-1-emilyshaffer@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.22.0.rc1.311.g5d7573a151-goog Subject: [PATCH] documentation: add tutorial for revision walking From: Emily Shaffer To: git@vger.kernel.org Cc: Emily Shaffer Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Existing documentation on revision walks seems to be primarily intended as a reference for those already familiar with the procedure. This tutorial attempts to give an entry-level guide to a couple of bare-bones revision walks so that new Git contributors can learn the concepts without having to wade through options parsing or special casing. The target audience is a Git contributor who is just getting started with the concept of revision walking. The goal is to prepare this contributor to be able to understand and modify existing commands which perform revision walks more easily, although it will also prepare contributors to create new commands which perform walks. The tutorial covers a basic overview of the structs involved during revision walk, setting up a basic commit walk, setting up a basic all-object walk, and adding some configuration changes to both walk types. It intentionally does not cover how to create new commands or search for options from the command line or gitconfigs. There is an associated patchset at https://github.com/nasamuffin/git/tree/revwalk that contains a reference implementation of the code generated by this tutorial. Signed-off-by: Emily Shaffer --- This one is longer than the MyFirstContribution one, thanks in advance to anybody with the wherewithal to review this. I'll also be mailing an RFC patchset In-Reply-To this message; the RFC patchset should not be merged to Git, as I intend to host it in my own mirror as an example. I hosted a similar example for the MyFirstContribution tutorial; it's visible at https://github.com/nasamuffin/git/tree/psuh. There might be a better place to host these so I don't "own" them but I'm not sure what it is; keeping them as a live branch somewhere struck me as an okay way to keep them from getting stale. Looking forward to hearing everyone's comments! - Emily Documentation/.gitignore | 1 + Documentation/Makefile | 1 + Documentation/MyFirstRevWalk.txt | 826 +++++++++++++++++++++++++++++++ 3 files changed, 828 insertions(+) create mode 100644 Documentation/MyFirstRevWalk.txt diff --git a/Documentation/.gitignore b/Documentation/.gitignore index 9022d48355..0e3df737c5 100644 --- a/Documentation/.gitignore +++ b/Documentation/.gitignore @@ -12,6 +12,7 @@ cmds-*.txt mergetools-*.txt manpage-base-url.xsl SubmittingPatches.txt +MyFirstRevWalk.txt tmp-doc-diff/ GIT-ASCIIDOCFLAGS /GIT-EXCLUDED-PROGRAMS diff --git a/Documentation/Makefile b/Documentation/Makefile index dbf5a0f276..d57b80962f 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica SP_ARTICLES += $(API_DOCS) TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstRevWalk TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt new file mode 100644 index 0000000000..494c09d1fa --- /dev/null +++ b/Documentation/MyFirstRevWalk.txt @@ -0,0 +1,826 @@ +My First Revision Walk +====================== + +== What's a Revision Walk? + +The revision walk is a key concept in Git - this is the process that underpins +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the +list of objects is found by walking parent relationships between objects. The +revision walk can also be usedto determine whether or not a given object is +reachable from the current HEAD pointer. + +=== Related Reading + +- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of + the revision walker in its various incarnations. +- `Documentation/technical/api-revision-walking.txt` +- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] + gives a good overview of the types of objects in Git and what your revision + walk is really describing. + +== Setting Up + +Create a new branch from `master`. + +---- +git checkout -b revwalk origin/master +---- + +We'll put our fiddling into a new command. For fun, let's name it `git walken`. +Open up a new file `builtin/walken.c` and set up the command handler: + +---- +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + printf(_("cmd_walken incoming...\n")); + return 0; +} +---- + +Add usage text and `-h` handling, in order to pass the test suite: + +---- +static const char * const walken_usage[] = { + N_("git walken"), + NULL, +} + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + struct option options[] = { + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + + ... +} +---- + +Also add the relevant line in builtin.h near `cmd_whatchanged()`: + +---- +extern int cmd_walken(int argc, const char **argv, const char *prefix); +---- + +Include the command in `git.c` in `commands[]` near the entry for `whatchanged`: + +---- +{ "walken", cmd_walken, RUN_SETUP }, +---- + +Add it to the `Makefile` near the line for `builtin\worktree.o`: + +---- +BUILTIN_OBJS += builtin/walken.o +---- + +Build and test out your command, without forgetting to ensure the `DEVELOPER` +flag is set: + +---- +echo DEVELOPER=1 >config.mak +make +./bin-wrappers/git walken +---- + +NOTE: For a more exhaustive overview of the new command process, take a look at +`Documentation/MyFirstContribution`. + +NOTE: A reference implementation can be found at TODO LINK. + +=== `struct rev_cmdline_info` + +The definition of `struct rev_cmdline_info` can be found in `revision.h`. + +This struct is contained within the `rev_info` struct and is used to reflect +parameters provided by the user over the CLI. + +`nr` represents the number of `rev_cmdline_entry` present in the array. + +`alloc` is used by the `ALLOC_GROW` macro. Check +`Documentation/technical/api-allocation-growing.txt` - this variable is used to +track the allocated size of the list. + +Per entry, we find: + +`item` is the object provided upon which to base the revision walk. Items in Git +can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.) + +`name` is the SHA-1 of the object - a 40-digit hex string you may be familiar +with from using Git to organize your source in the past. Check the tutorial +mentioned above towards the top for a discussion of where the SHA-1 can come +from. + +`whence` indicates some information about what to do with the parents of the +specified object. We'll explore this flag more later on; take a look at +`Documentation/revisions.txt` to get an idea of what could set the `whence` +value. + +`flags` are used to hint the beginning of the revision walk and are the first +block under the `#include`s in `revision.h`. The most likely ones to be set in +the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags +can be used during the walk, as well. + +=== `struct rev_info` + +This one is quite a bit longer, and many fields are only used during the walk +by `revision.c` - not configuration options. Most of the configurable flags in +`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a +good idea to take some time and read through that document. + +== Basic Commit Walk + +First, let's see if we can replicate the output of `git log --oneline`. We'll +refer back to the implementation frequently to discover norms when performing +a revision walk of our own. + +We'll need all the commits, in order, which preceded our current commit. We will +also need to know the name and subject. + +Ideally, we will also be able to find out which ones are currently at the tip of +various branches. + +=== Setting Up + +Preparing for your revision walk has some distinct stages. + +1. Perform default setup for this mode, and others which may be invoked. +2. Check configuration files for relevant settings. +3. Set up the rev_info struct. +4. Tweak the initialized rev_info to suit the current walk. +5. Prepare the rev_info for the walk. +6. Iterate over the objects, processing each one. + +==== Default Setups + +Before you begin to examine user configuration for your revision walk, it's +common practice for you to initialize to default any switches that your command +may have, as well as ask any other components you may invoke to initialize as +well. `git log` does this in `init_log_defaults()`; in that case, one global +`decoration_style` is initialized, as well as the grep and diff-UI components. + +For our purposes, within `git walken`, for the first example we do we don't +intend to invoke anything, and we don't have any configuration to do. However, +we may want to add some later, so for now, we can add an empty placeholder. +Create a new function in `builtin/walken.c`: + +---- +static void init_walken_defaults(void) +{ + /* We don't actually need the same components `git log` does; leave this + * empty for now. + */ +} +---- + +Make sure to add a line invoking it inside of `cmd_walken()`. + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + init_walken_defaults(); +} +---- + +==== Configuring From `.gitconfig` + +Next, we should have a look at any relevant configuration settings (i.e., +settings readable and settable from `git config`). This is done by providing a +callback to `git_config()`; within that callback, you can also invoke methods +from other components you may need that need to intercept these options. Your +callback will be invoked once per each configuration value which Git knows about +(global, local, worktree, etc.). + +Similarly to the default values, we don't have anything to do here yet +ourselves; however, we should call `git_default_config()` if we aren't calling +any other existing config callbacks. + +TODO: Use the "modern" configset API + +Add a new function to `builtin/walken.c`: + +---- +static int git_walken_config(const char *var, const char *value, void *cb) +{ + /* For now, let's not bother with anything. */ + return git_default_config(var, value, cb); +} +---- + +Make sure to invoke `git_config()` with it in your `cmd_walken()`: + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + ... + + git_config(git_walken_config, NULL); +} +---- + +// TODO: Checking CLI options + +==== Setting Up `rev_info` + +Now that we've gathered external configuration and options, it's time to +initialize the `rev_info` object which we will use to perform the walk. This is +typically done by calling `repo_init_revisions()` with the repository you intend +to target, as well as the prefix and your `rev_info` struct. + +Add the `struct rev_info` and the `repo_init_revisions()` call: +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + /* This can go wherever you like in your declarations.*/ + struct rev_info rev; + ... + + /* This should go after the git_config() call. */ + repo_init_revisions(the_repository, &rev, prefix); +} +---- + +==== Tweaking `rev_info` For the Walk + +We're getting close, but we're still not quite ready to go. Now that `rev` is +initialized, we can modify it to fit our needs. This is usually done within a +helper for clarity, so let's add one: + +---- +static void final_rev_info_setup(struct rev_info *rev) +{ + /* We want to mimick the appearance of `git log --oneline`, so let's + * force oneline format. */ + get_commit_format("oneline", rev); + + /* Start our revision walk at HEAD. */ + add_head_to_pending(rev); +} +---- + +[NOTE] +==== +Instead of using the shorthand `add_head_to_pending()`, you could do +something like this: +---- + struct setup_revision_opt opt; + + memset(&opt, 0, sizeof(opt)); + opt.def = "HEAD"; + opt.revarg_opt = REVARG_COMMITTISH; + setup_revisions(argc, argv, rev, &opt); +---- +Using a `setup_revision_opt` gives you finer control over your walk's starting +point. +==== + +Then let's invoke `final_rev_info_setup()` after the call to +`repo_init_revisions()`: + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + ... + + final_rev_info_setup(&rev); +} +---- + +Later, we may wish to add more arguments to `final_rev_info_setup()`. But for +now, this is all we need. + +==== Preparing `rev_info` For the Walk + +Now that `rev` is all initialized and configured, we've got one more setup step +before we get rolling. We can do this in a helper, which will both prepare the +`rev_info` for the walk, and perform the walk itself. Let's start the helper +with the call to `prepare_revision_walk()`. + +---- +static int walken_commit_walk(struct rev_info *rev) +{ + /* prepare_revision_walk() gets the final steps ready for a revision + * walk. We check the return value for errors. */ + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); +} +---- + +==== Performing the Walk! + +Finally! We are ready to begin the walk itself. Now we can see that `rev_info` +can also be used as an iterator; we move to the next item in the walk by using +`get_revision()` repeatedly. Add the listed variable declarations at the top and +the walk loop below the `prepare_revision_walk()` call within your +`walken_commit_walk()`: + +---- +static int walken_commit_walk(struct rev_info *rev) +{ + struct commit *commit; + struct strbuf prettybuf; + strbuf_init(&prettybuf, 0); + + ... + + while ((commit = get_revision(rev)) != NULL) { + if (commit == NULL) + continue; + + strbuf_reset(&prettybuf); + pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); + printf(_("%s\n"), prettybuf.buf); + } + + return 0; +} +---- + +Give it a shot. + +---- +$ make +$ ./bin-wrappers/git walken +---- + +You should see all of the subject lines of all the commits in +your tree's history, in order, ending with the initial commit, "Initial revision +of "git", the information manager from hell". Congratulations! You've written +your first revision walk. You can play with printing some additional fields +from each commit if you're curious; have a look at the functions available in +`commit.h`. + +=== Adding a Filter + +Next, let's try to filter the commits we see based on their author. This is +equivalent to running `git log --author=`. We can add a filter by +modifying `rev_info.grep_filter`, which is a `struct grep_opt`. + +First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add +`grep_config()` to `git_walken_config()`: + +---- +static void init_walken_defaults(void) +{ + init_grep_defaults(the_repository); +} + +... + +static int git_walken_config(const char *var, const char *value, void *cb) +{ + grep_config(var, value, cb); + return git_default_config(var, value, cb); +} +---- + +Next, we can modify the `grep_filter`. This is done with convenience functions +found in `grep.h`. For fun, we're filtering to only commits from folks using a +gmail.com email address - a not-very-precise guess at who may be working on Git +as a hobby. Since we're checking the author, which is a specific line in the +header, we'll use the `append_header_grep_pattern()` helper. We can use +the `enum grep_header_field` to indicate which part of the commit header we want +to search. + +In `final_rev_info_setup()`, add your filter line: + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, + "gmail"); + compile_grep_patterns(&rev->grep_filter); + + ... +} +---- + +`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but +it won't work unless we compile it with `compile_grep_patterns()`. + +NOTE: If you are using `setup_revisions()` (for example, if you are passing a +`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need +to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. + +NOTE: We could add the same filter via the `append_grep_pattern()` helper if we +wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and +`enum grep_pat_token` for us. + +=== Changing the Order + +There are a few ways that we can change the order of the commits during a +revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some +sane orderings. + +Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to +`REV_SORT_BY_AUTHOR_DATE`. Add the following: + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + rev->topo_order = 1; + rev->sort_order = REV_SORT_BY_COMMIT_DATE; + + ... +} +---- + +Let's output this into a file so we can easily diff it with the walk sorted by +author date. + +---- +$ make +$ ./bin-wrappers/git walken > commit-date.txt +---- + +Then, let's sort by author date and run it again. + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + rev->topo_order = 1; + rev->sort_order = REV_SORT_BY_AUTHOR_DATE; + + ... +} +---- + +---- +$ make +$ ./bin-wrappers/git walken > author-date.txt +---- + +Finally, compare the two. This is a little less helpful without object names or +dates, but hopefully we get the idea. + +---- +$ diff -u commit-date.txt author-date.txt +---- + +This display is an indicator for the latency between publishing a commit for +review the first time, and getting it actually merged into master. + +Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. +However, it needs to be applied after `add_head_to_pending()` is called. Find +the line where you call `add_head_to_pending()` and set the `reverse` flag right +after: + +---- +static void final_rev_info_setup(int argc, const char **argv, const char *prefix, + struct rev_info *rev) +{ + ... + + add_head_to_pending(rev); + rev->reverse = 1; + + ... +} +---- + +Run your walk again and note the difference in order. (If you remove the grep +pattern, you should see the last commit this call gives you as your current +HEAD.) + +== Basic Object Walk + +So far we've been walking only commits. But Git has more types of objects than +that! Let's see if we can walk _all_ objects, and find out some information +about each one. + +We can base our work on an example. `git pack-objects` prepares all kinds of +objects for packing into a bitmap or packfile. The work we are interested in +resides in `builtins/pack-objects.c:get_object_list()`; examination of that +function shows that the all-object walk is being performed by +`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two +functions reside in `list-objects.c`; examining the source shows that, despite +the name, these functions traverse all kinds of objects. Let's have a look at +the arguments to `traverse_commit_list_filtered()`, which are a superset of the +arguments to the unfiltered version. + +- `struct list_objects_filter_options *filter_options`: This is a struct which + stores a filter-spec as outlined in `Documentation/rev-list-options.txt`. +- `struct rev_info *revs`: This is the `rev_info` used for the walk. +- `show_commit_fn show_commit`: A callback which will be used to handle each + individual commit object. +- `show_object_fn show_object`: A callback which will be used to handle each + non-commit object (so each blob, tree, or tag). +- `void show_data*`: A context buffer which is passed in turn to `show_commit` + and `show_object`. +- `struct oidset *omitted`: A linked-list of object IDs which the provided + filter caused to be omitted. + +It looks like this `traverse_commit_list_filtered()` uses callbacks we provide +instead of needing us to call it repeatedly ourselves. Cool! Let's add the +callbacks first. + +For the sake of this tutorial, we'll simply keep track of how many of each kind +of object we find. At file scope in `builtin/walken.c` add the following +tracking variables: + +---- +static int commit_count; +static int tag_count; +static int blob_count; +static int tree_count; +---- + +Commits are handled by a different callback than other objects; let's do that +one first: + +---- +static void walken_show_commit(struct commit *cmt, void *buf) +{ + commit_count++; +} +---- + +Since we have the `struct commit` object, we can look at all the same parts that +we looked at in our earlier commit-only walk. For the sake of this tutorial, +though, we'll just increment the commit counter and move on. + +The callback for non-commits is a little different, as we'll need to check +which kind of object we're dealing with: + +---- +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + switch (obj->type) { + case OBJ_TREE: + tree_count++; + break; + case OBJ_BLOB: + blob_count++; + break; + case OBJ_TAG: + tag_count++; + break; + case OBJ_COMMIT: + printf(_("Unexpectedly encountered a commit in " + "walken_show_object!\n")); + commit_count++; + break; + default: + printf(_("Unexpected object type %s!\n"), + type_name(obj->type)); + break; + } +} +---- + +To help assure us that we aren't double-counting commits, we'll include some +complaining if a commit object is routed through our non-commit callback; we'll +also complain if we see an invalid object type. + +Our main object walk implementation is substantially different from our commit +walk implementation, so let's make a new function to perform the object walk. We +can perform setup which is applicable to all objects here, too, to keep separate +from setup which is applicable to commit-only walks. + +---- +static int walken_object_walk(struct rev_info *rev) +{ +} +---- + +We'll start by enabling all types of objects in the `struct rev_info`, and +asking to have our trees and blobs shown in commit order. We'll also exclude +promisors as the walk becomes more complicated with those types of objects. When +our settings are ready, we'll perform the normal revision walk setup and +initialize our tracking variables. + +---- +static int walken_object_walk(struct rev_info *rev) +{ + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); + + commit_count = 0; + tag_count = 0; + blob_count = 0; + tree_count = 0; +---- + +Unless you cloned or fetched your repository earlier with a filter, +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it +on just to make sure our lives are simple. We'll also turn on +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and +everything it points to immediately after we find each commit, as opposed to +waiting for the end and walking through all trees after the commit history has +been discovered. + +Let's start by calling just the unfiltered walk and reporting our counts. +Complete your implementation of `walken_object_walk()`: + +---- + traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + + printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " + "and %d trees.\n"), commit_count, blob_count, tag_count, + tree_count); + + return 0; +} +---- + +Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing +command line options is out of scope for this tutorial, so we'll just hardcode +a branch we can change at compile time. Where you call `final_rev_info_setup()` +and `walken_commit_walk()`, instead branch like so: + +---- + if (1) { + add_head_to_pending(&rev); + walken_object_walk(&rev); + } else { + final_rev_info_setup(argc, argv, prefix, &rev); + walken_commit_walk(&rev); + } +---- + +NOTE: For simplicity, we've avoided all the filters and sorts we applied in +`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you +want, you can certainly use the filters we added before by moving +`final_rev_info_setup()` out of the conditional and removing the call to +`add_head_to_pending()`. + +Now we can try to run our command! It should take noticeably longer than the +commit walk, but an examination of the output will give you an idea why - for +example: + +---- +Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. +---- + +This makes sense. We have more trees than commits because the Git project has +lots of subdirectories which can change, plus at least one tree per commit. We +have no tags because we started on a commit (`HEAD`) and while tags can point to +commits, commits can't point to tags. + +NOTE: You will have different counts when you run this yourself! The number of +objects grows along with the Git project. + +=== Adding a Filter + +There are a handful of filters that we can apply to the object walk laid out in +`Documentation/rev-list-options.txt`. These filters are typically useful for +operations such as creating packfiles or performing a partial or shallow clone. +They are defined in `list-objects-filter-options.h`. For the purposes of this +tutorial we will use the "tree:1" filter, which causes the walk to omit all +trees and blobs which are not directly referenced by commits reachable from the +commit in `pending` when the walk begins. (In our case, that means we omit trees +and blobs not directly referenced by HEAD or HEAD's history.) + +First, we'll need to `#include "list-objects-filter-options.h`". Then, we can +set up the `struct list_objects_filter_options` and `struct oidset` at the top +of `walken_object_walk()`: + +---- +static int walken_object_walk(struct rev_info *rev) +{ + struct list_objects_filter_options filter_options = {}; + struct oidset omitted; + oidset_init(&omitted, 0); + ... +---- + +Then, for the sake of simplicity, we'll add a simple build-time branch to use +our filter or not. Replace the line calling `traverse_commit_list()` with the +following, which will remind us which kind of walk we've just performed: + +---- + if (1) { + /* Unfiltered: */ + printf(_("Unfiltered object walk.\n")); + traverse_commit_list(rev, walken_show_commit, + walken_show_object, NULL); + } else { + printf(_("Filtered object walk with filterspec 'tree:1'.\n")); + /* + * We can parse a tree depth of 1 to demonstrate the kind of + * filtering that could occur eg during shallow cloning. + */ + parse_list_objects_filter(&filter_options, "tree:1"); + + traverse_commit_list_filtered(&filter_options, rev, + walken_show_commit, walken_show_object, NULL, &omitted); + } +---- + +`struct list_objects_filter_options` is usually built directly from a command +line argument, so the module provides an easy way to build one from a string. +Even though we aren't taking user input right now, we can still build one with +a hardcoded string using `parse_list_objects_filter()`. + +After we run `traverse_commit_list_filtered()` we would also be able to examine +`omitted`, which is a linked-list of all objects we did not include in our walk. +Since all omitted objects are included, the performance of +`traverse_commit_list_filtered()` with a non-null `omitted` arument is equitable +with the performance of `traverse_commit_list()`; so for our purposes, we leave +it null. It's easy to provide one and iterate over it, though - check `oidset.h` +for the declaration of the accessor methods for `oidset`. + +With the filter spec "tree:1", we are expecting to see _only_ the root tree for +each commit; therefore, the tree object count should be less than or equal to +the number of commits. (For an example of why that's true: `git commit --revert` +points to the same tree object as its grandparent.) + +=== Changing the Order + +Finally, let's demonstrate that you can also reorder walks of all objects, not +just walks of commits. First, we'll make our handlers chattier - modify +`walken_show_commit()` and `walken_show_object` to print the object as they go: + +---- +static void walken_show_commit(struct commit *cmt, void *buf) +{ + printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid)); + commit_count++; +} + +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid)); + ... +} +---- + +(Try to leave the counter increment logic in place in `walken_show_object()`.) + +With only that change, run again (but save yourself some scrollback): + +---- +$ ./bin-wrappers/git walken | head -n 10 +---- + +Take a look at the top commit with `git show` and the OID you printed; it should +be the same as the output of `git show HEAD`. + +Next, let's change a setting on our `struct rev_info` within +`walken_object_walk()`. Find where you're changing the other settings on `rev`, +such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add +another setting at the bottom: + +---- + ... + + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + rev->reverse = 1; + + ... +---- + +Now, run again, but this time, let's grab the last handful of objects instead +of the first handful: + +---- +$ make +$ ./bin-wrappers git walken | tail -n 10 +---- + +The last commit object given should have the same OID as the one we saw at the +top before, and running `git show ` with that OID should give you again +the same results as `git show HEAD`. Furthermore, if you run and examine the +first ten lines again (with `head` instead of `tail` like we did before applying +the `reverse` setting), you should see that now the first commit printed is the +initial commit, `e83c5163`. + +== Wrapping Up + +Let's review. In this tutorial, we: + +- Built a commit walk from the ground up +- Enabled a grep filter for that commit walk +- Changed the sort order of that filtered commit walk +- Built an object walk (tags, commits, trees, and blobs) from the ground up +- Learned how to add a filter-spec to an object walk +- Changed the display order of the filtered object walk