[v3] documentation: add tutorial for revision walking

Message ID	20190701201934.30321-1-emilyshaffer@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Mon, 1 Jul 2019 13:19:34 -0700 In-Reply-To: <20190626234915.171658-1-emilyshaffer@google.com> Message-Id: <20190701201934.30321-1-emilyshaffer@google.com> Mime-Version: 1.0 Subject: [PATCH v3] documentation: add tutorial for revision walking From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com>, Junio C Hamano <gitster@pobox.com>, Eric Sunshine <sunshine@sunshineco.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[v3] documentation: add tutorial for revision walking \| expand [v3] documentation: add tutorial for revision walking [RFC,v3,02/13] walken: add usage to enable -h

diff --git a/Documentation/Makefile b/Documentation/Makefile index 76f2ecfc1b..91e5da67c4 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS) TECH_DOCS += MyFirstContribution TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstRevWalk TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt new file mode 100644 index 0000000000..6a432f35a0 --- /dev/null +++ b/Documentation/MyFirstRevWalk.txt @@ -0,0 +1,908 @@ +My First Revision Walk +====================== + +== What's a Revision Walk? + +The revision walk is a key concept in Git - this is the process that underpins +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the +list of objects is found by walking parent relationships between objects. The +revision walk can also be used to determine whether or not a given object is +reachable from the current HEAD pointer. + +=== Related Reading + +- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of + the revision walker in its various incarnations. +- `Documentation/technical/api-revision-walking.txt` +- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] + gives a good overview of the types of objects in Git and what your revision + walk is really describing. + +== Setting Up + +Create a new branch from `master`. + +---- +git checkout -b revwalk origin/master +---- + +We'll put our fiddling into a new command. For fun, let's name it `git walken`. +Open up a new file `builtin/walken.c` and set up the command handler: + +---- +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + trace_printf(_("cmd_walken incoming...\n")); + return 0; +} +---- + +NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or +off at runtime. For the purposes of this tutorial, we will write `walken` as +though it is intended for use as a "plumbing" command: that is, a command which +is used primarily in scripts, rather than interactively by humans (a "porcelain" +command). So we will send our debug output to `trace_printf()` instead. When +running, enable trace output by setting the environment variable `GIT_TRACE`. + +Add usage text and `-h` handling, like all subcommands should consistently do +(our test suite will notice and complain if you fail to do so). + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + const char * const walken_usage[] = { + N_("git walken"), + NULL, + } + struct option options[] = { + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + + ... +} +---- + +Also add the relevant line in `builtin.h` near `cmd_whatchanged()`: + +---- +extern int cmd_walken(int argc, const char **argv, const char *prefix); +---- + +Include the command in `git.c` in `commands[]` near the entry for `whatchanged`, +maintaining alphabetical ordering: + +---- +{ "walken", cmd_walken, RUN_SETUP }, +---- + +Add it to the `Makefile` near the line for `builtin\worktree.o`: + +---- +BUILTIN_OBJS += builtin/walken.o +---- + +Build and test out your command, without forgetting to ensure the `DEVELOPER` +flag is set, and with `GIT_TRACE` enabled so the debug output can be seen: + +---- +$ echo DEVELOPER=1 >>config.mak +$ make +$ GIT_TRACE=1 ./bin-wrappers/git walken +---- + +NOTE: For a more exhaustive overview of the new command process, take a look at +`Documentation/MyFirstContribution.txt`. + +NOTE: A reference implementation can be found at TODO LINK. + +=== `struct rev_cmdline_info` + +The definition of `struct rev_cmdline_info` can be found in `revision.h`. + +This struct is contained within the `rev_info` struct and is used to reflect +parameters provided by the user over the CLI. + +`nr` represents the number of `rev_cmdline_entry` present in the array. + +`alloc` is used by the `ALLOC_GROW` macro. Check +`Documentation/technical/api-allocation-growing.txt` - this variable is used to +track the allocated size of the list. + +Per entry, we find: + +`item` is the object provided upon which to base the revision walk. Items in Git +can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.) + +`name` is the object ID (OID) of the object - a hex string you may be familiar +with from using Git to organize your source in the past. Check the tutorial +mentioned above towards the top for a discussion of where the OID can come +from. + +`whence` indicates some information about what to do with the parents of the +specified object. We'll explore this flag more later on; take a look at +`Documentation/revisions.txt` to get an idea of what could set the `whence` +value. + +`flags` are used to hint the beginning of the revision walk and are the first +block under the `#include`s in `revision.h`. The most likely ones to be set in +the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags +can be used during the walk, as well. + +=== `struct rev_info` + +This one is quite a bit longer, and many fields are only used during the walk +by `revision.c` - not configuration options. Most of the configurable flags in +`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a +good idea to take some time and read through that document. + +== Basic Commit Walk + +First, let's see if we can replicate the output of `git log --oneline`. We'll +refer back to the implementation frequently to discover norms when performing +a revision walk of our own. + +To do so, we'll first find all the commits, in order, which preceded the current +commit. We'll extract the name and subject of the commit from each. + +Ideally, we will also be able to find out which ones are currently at the tip of +various branches. + +=== Setting Up + +Preparing for your revision walk has some distinct stages. + +1. Perform default setup for this mode, and others which may be invoked. +2. Check configuration files for relevant settings. +3. Set up the `rev_info` struct. +4. Tweak the initialized `rev_info` to suit the current walk. +5. Prepare the `rev_info` for the walk. +6. Iterate over the objects, processing each one. + +==== Default Setups + +Before examining configuration files which may modify command behavior, set up +default state for switches or options your command may have. If your command +utilizes other Git components, ask them to set up their default states as well. +For instance, `git log` takes advantage of `grep` and `diff` functionality, so +its `init_log_defaults()` sets its own state (`decoration_style`) and asks +`grep` and `diff` to initialize themselves by calling each of their +initialization functions. + +For our purposes, within `git walken`, for the first example we don't intend to +use any other components within Git, and we don't have any configuration to do. +However, we may want to add some later, so for now, we can add an empty +placeholder. Create a new function in `builtin/walken.c`: + +---- +static void init_walken_defaults(void) +{ + /* + * We don't actually need the same components `git log` does; leave this + * empty for now. + */ +} +---- + +Make sure to add a line invoking it inside of `cmd_walken()`. + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + init_walken_defaults(); +} +---- + +==== Configuring From `.gitconfig` + +Next, we should have a look at any relevant configuration settings (i.e., +settings readable and settable from `git config`). This is done by providing a +callback to `git_config()`; within that callback, you can also invoke methods +from other components you may need that need to intercept these options. Your +callback will be invoked once per each configuration value which Git knows about +(global, local, worktree, etc.). + +Similarly to the default values, we don't have anything to do here yet +ourselves; however, we should call `git_default_config()` if we aren't calling +any other existing config callbacks. + +Add a new function to `builtin/walken.c`: + +---- +static int git_walken_config(const char *var, const char *value, void *cb) +{ + /* + * For now, we don't have any custom configuration, so fall back to + * the default config. + */ + return git_default_config(var, value, cb); +} +---- + +Make sure to invoke `git_config()` with it in your `cmd_walken()`: + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + ... + + git_config(git_walken_config, NULL); + + ... +} +---- + +// TODO: Checking CLI options + +==== Setting Up `rev_info` + +Now that we've gathered external configuration and options, it's time to +initialize the `rev_info` object which we will use to perform the walk. This is +typically done by calling `repo_init_revisions()` with the repository you intend +to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info` +struct. + +Add the `struct rev_info` and the `repo_init_revisions()` call: +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + /* This can go wherever you like in your declarations.*/ + struct rev_info rev; + ... + + /* This should go after the git_config() call. */ + repo_init_revisions(the_repository, &rev, prefix); + + ... +} +---- + +==== Tweaking `rev_info` For the Walk + +We're getting close, but we're still not quite ready to go. Now that `rev` is +initialized, we can modify it to fit our needs. This is usually done within a +helper for clarity, so let's add one: + +---- +static void final_rev_info_setup(struct rev_info *rev) +{ + /* + * We want to mimic the appearance of `git log --oneline`, so let's + * force oneline format. + */ + get_commit_format("oneline", rev); + + /* Start our revision walk at HEAD. */ + add_head_to_pending(rev); +} +---- + +[NOTE] +==== +Instead of using the shorthand `add_head_to_pending()`, you could do +something like this: +---- + struct setup_revision_opt opt; + + memset(&opt, 0, sizeof(opt)); + opt.def = "HEAD"; + opt.revarg_opt = REVARG_COMMITTISH; + setup_revisions(argc, argv, rev, &opt); +---- +Using a `setup_revision_opt` gives you finer control over your walk's starting +point. +==== + +Then let's invoke `final_rev_info_setup()` after the call to +`repo_init_revisions()`: + +---- +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + ... + + final_rev_info_setup(&rev); + + ... +} +---- + +Later, we may wish to add more arguments to `final_rev_info_setup()`. But for +now, this is all we need. + +==== Preparing `rev_info` For the Walk + +Now that `rev` is all initialized and configured, we've got one more setup step +before we get rolling. We can do this in a helper, which will both prepare the +`rev_info` for the walk, and perform the walk itself. Let's start the helper +with the call to `prepare_revision_walk()`, which can return an error without +dying on its own: + +---- +static void walken_commit_walk(struct rev_info *rev) +{ + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); +} +---- + +NOTE: `die()` prints to `stderr` and exits the program. Since it will print to +`stderr` it's likely to be seen by a human, so we will localize it. + +==== Performing the Walk! + +Finally! We are ready to begin the walk itself. Now we can see that `rev_info` +can also be used as an iterator; we move to the next item in the walk by using +`get_revision()` repeatedly. Add the listed variable declarations at the top and +the walk loop below the `prepare_revision_walk()` call within your +`walken_commit_walk()`: + +---- +static void walken_commit_walk(struct rev_info *rev) +{ + struct commit *commit; + struct strbuf prettybuf = STRBUF_INIT; + + ... + + while ((commit = get_revision(rev))) { + if (!commit) + continue; + + strbuf_reset(&prettybuf); + pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); + puts(prettybuf.buf); + } + strbuf_release(&prettybuf); +} +---- + +NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the +command we expect to be machine-parsed, we're sending it directly to stdout. + +Give it a shot. + +---- +$ make +$ ./bin-wrappers/git walken +---- + +You should see all of the subject lines of all the commits in +your tree's history, in order, ending with the initial commit, "Initial revision +of "git", the information manager from hell". Congratulations! You've written +your first revision walk. You can play with printing some additional fields +from each commit if you're curious; have a look at the functions available in +`commit.h`. + +=== Adding a Filter + +Next, let's try to filter the commits we see based on their author. This is +equivalent to running `git log --author=<pattern>`. We can add a filter by +modifying `rev_info.grep_filter`, which is a `struct grep_opt`. + +First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add +`grep_config()` to `git_walken_config()`: + +---- +static void init_walken_defaults(void) +{ + init_grep_defaults(the_repository); +} + +... + +static int git_walken_config(const char *var, const char *value, void *cb) +{ + grep_config(var, value, cb); + return git_default_config(var, value, cb); +} +---- + +Next, we can modify the `grep_filter`. This is done with convenience functions +found in `grep.h`. For fun, we're filtering to only commits from folks using a +`gmail.com` email address - a not-very-precise guess at who may be working on +Git as a hobby. Since we're checking the author, which is a specific line in the +header, we'll use the `append_header_grep_pattern()` helper. We can use +the `enum grep_header_field` to indicate which part of the commit header we want +to search. + +In `final_rev_info_setup()`, add your filter line: + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, + "gmail"); + compile_grep_patterns(&rev->grep_filter); + + ... +} +---- + +`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but +it won't work unless we compile it with `compile_grep_patterns()`. + +NOTE: If you are using `setup_revisions()` (for example, if you are passing a +`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need +to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. + +NOTE: We could add the same filter via the `append_grep_pattern()` helper if we +wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and +`enum grep_pat_token` for us. + +=== Changing the Order + +There are a few ways that we can change the order of the commits during a +revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some +sane orderings. + +`topo_order` is the same as `git log --topo-order`: we avoid showing a parent +before all of its children have been shown, and we avoid mixing commits which +are in different lines of history. (`git help log`'s section on `--topo-order` +has a very nice diagram to illustrate this.) + +Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to +`REV_SORT_BY_AUTHOR_DATE`. Add the following: + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + rev->topo_order = 1; + rev->sort_order = REV_SORT_BY_COMMIT_DATE; + + ... +} +---- + +Let's output this into a file so we can easily diff it with the walk sorted by +author date. + +---- +$ make +$ ./bin-wrappers/git walken > commit-date.txt +---- + +Then, let's sort by author date and run it again. + +---- +static void final_rev_info_setup(int argc, const char **argv, + const char *prefix, struct rev_info *rev) +{ + ... + + rev->topo_order = 1; + rev->sort_order = REV_SORT_BY_AUTHOR_DATE; + + ... +} +---- + +---- +$ make +$ ./bin-wrappers/git walken > author-date.txt +---- + +Finally, compare the two. This is a little less helpful without object names or +dates, but hopefully we get the idea. + +---- +$ diff -u commit-date.txt author-date.txt +---- + +This display indicates that commits can be reordered after they're written, for +example with `git rebase`. + +Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. +Set that flag somewhere inside of `final_rev_info_setup()`: + +---- +static void final_rev_info_setup(int argc, const char **argv, const char *prefix, + struct rev_info *rev) +{ + ... + + rev->reverse = 1; + + ... +} +---- + +Run your walk again and note the difference in order. (If you remove the grep +pattern, you should see the last commit this call gives you as your current +HEAD.) + +== Basic Object Walk + +So far we've been walking only commits. But Git has more types of objects than +that! Let's see if we can walk _all_ objects, and find out some information +about each one. + +We can base our work on an example. `git pack-objects` prepares all kinds of +objects for packing into a bitmap or packfile. The work we are interested in +resides in `builtins/pack-objects.c:get_object_list()`; examination of that +function shows that the all-object walk is being performed by +`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two +functions reside in `list-objects.c`; examining the source shows that, despite +the name, these functions traverse all kinds of objects. Let's have a look at +the arguments to `traverse_commit_list_filtered()`, which are a superset of the +arguments to the unfiltered version. + +- `struct list_objects_filter_options *filter_options`: This is a struct which + stores a filter-spec as outlined in `Documentation/rev-list-options.txt`. +- `struct rev_info *revs`: This is the `rev_info` used for the walk. +- `show_commit_fn show_commit`: A callback which will be used to handle each + individual commit object. +- `show_object_fn show_object`: A callback which will be used to handle each + non-commit object (so each blob, tree, or tag). +- `void *show_data`: A context buffer which is passed in turn to `show_commit` + and `show_object`. +- `struct oidset *omitted`: A linked-list of object IDs which the provided + filter caused to be omitted. + +It looks like this `traverse_commit_list_filtered()` uses callbacks we provide +instead of needing us to call it repeatedly ourselves. Cool! Let's add the +callbacks first. + +For the sake of this tutorial, we'll simply keep track of how many of each kind +of object we find. At file scope in `builtin/walken.c` add the following +tracking variables: + +---- +static int commit_count; +static int tag_count; +static int blob_count; +static int tree_count; +---- + +Commits are handled by a different callback than other objects; let's do that +one first: + +---- +static void walken_show_commit(struct commit *cmt, void *buf) +{ + commit_count++; +} +---- + +The `cmt` argument is fairly self-explanatory. But it's worth mentioning that +the `buf` argument is actually the context buffer that we can provide to the +traversal calls - `show_data`, which we mentioned a moment ago. + +Since we have the `struct commit` object, we can look at all the same parts that +we looked at in our earlier commit-only walk. For the sake of this tutorial, +though, we'll just increment the commit counter and move on. + +The callback for non-commits is a little different, as we'll need to check +which kind of object we're dealing with: + +---- +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + switch (obj->type) { + case OBJ_TREE: + tree_count++; + break; + case OBJ_BLOB: + blob_count++; + break; + case OBJ_TAG: + tag_count++; + break; + case OBJ_COMMIT: + BUG("unexpected commit object in walken_show_object\n"); + default: + BUG("unexpected object type %s in walken_show_object\n", + type_name(obj->type)); + } +} +---- + +Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same +context pointer that `walken_show_commit()` receives: the `show_data` argument +to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally, +`str` contains the name of the object, which ends up being something like +`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag). + +To help assure us that we aren't double-counting commits, we'll include some +complaining if a commit object is routed through our non-commit callback; we'll +also complain if we see an invalid object type. Since those two cases should be +unreachable, and would only change in the event of a semantic change to the Git +codebase, we complain by using `BUG()` - which is a signal to a developer that +the change they made caused unintended consequences, and the rest of the +codebase needs to be updated to understand that change. `BUG()` is not intended +to be seen by the public, so it is not localized. + +Our main object walk implementation is substantially different from our commit +walk implementation, so let's make a new function to perform the object walk. We +can perform setup which is applicable to all objects here, too, to keep separate +from setup which is applicable to commit-only walks. + +We'll start by enabling all types of objects in the `struct rev_info`. Unless +you cloned or fetched your repository earlier with a filter, +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it +on just to make sure our lives are simple. We'll also turn on +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and +everything it points to immediately after we find each commit, as opposed to +waiting for the end and walking through all trees after the commit history has +been discovered. With the appropriate settings configured, we are ready to call +`prepare_revision_walk()`. + +---- +static void walken_object_walk(struct rev_info *rev) +{ + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); + + commit_count = 0; + tag_count = 0; + blob_count = 0; + tree_count = 0; +---- + +Let's start by calling just the unfiltered walk and reporting our counts. +Complete your implementation of `walken_object_walk()`: + +---- + traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + + printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count, + blob_count, tag_count, tree_count); +} +---- + +NOTE: This output is intended to be machine-parsed. Therefore, we are not +sending it to `trace_printf()`, and we are not localizing it - we need scripts +to be able to count on the formatting to be exactly the way it is shown here. +If we were intending this output to be read by humans, we would need to localize +it with `_()`. + +Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing +command line options is out of scope for this tutorial, so we'll just hardcode +a branch we can change at compile time. Where you call `final_rev_info_setup()` +and `walken_commit_walk()`, instead branch like so: + +---- + if (1) { + add_head_to_pending(&rev); + walken_object_walk(&rev); + } else { + final_rev_info_setup(argc, argv, prefix, &rev); + walken_commit_walk(&rev); + } +---- + +NOTE: For simplicity, we've avoided all the filters and sorts we applied in +`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you +want, you can certainly use the filters we added before by moving +`final_rev_info_setup()` out of the conditional and removing the call to +`add_head_to_pending()`. + +Now we can try to run our command! It should take noticeably longer than the +commit walk, but an examination of the output will give you an idea why. Your +output should look similar to this example, but with different counts: + +---- +Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. +---- + +This makes sense. We have more trees than commits because the Git project has +lots of subdirectories which can change, plus at least one tree per commit. We +have no tags because we started on a commit (`HEAD`) and while tags can point to +commits, commits can't point to tags. + +NOTE: You will have different counts when you run this yourself! The number of +objects grows along with the Git project. + +=== Adding a Filter + +There are a handful of filters that we can apply to the object walk laid out in +`Documentation/rev-list-options.txt`. These filters are typically useful for +operations such as creating packfiles or performing a partial clone. They are +defined in `list-objects-filter-options.h`. For the purposes of this tutorial we +will use the "tree:1" filter, which causes the walk to omit all trees and blobs +which are not directly referenced by commits reachable from the commit in +`pending` when the walk begins. (`pending` is the list of objects which need to +be traversed during a walk; you can imagine a breadth-first tree traversal to +help understand. In our case, that means we omit trees and blobs not directly +referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only +`HEAD` in the `pending` list.) + +First, we'll need to `#include "list-objects-filter-options.h`" and set up the +`struct list_objects_filter_options` at the top of the function. + +---- +static void walken_object_walk(struct rev_info *rev) +{ + struct list_objects_filter_options filter_options = {}; + + ... +---- + +For now, we are not going to track the omitted objects, so we'll replace those +parameters with `NULL`. For the sake of simplicity, we'll add a simple +build-time branch to use our filter or not. Replace the line calling +`traverse_commit_list()` with the following, which will remind us which kind of +walk we've just performed: + +---- + if (0) { + /* Unfiltered: */ + trace_printf(_("Unfiltered object walk.\n")); + traverse_commit_list(rev, walken_show_commit, + walken_show_object, NULL); + } else { + trace_printf( + _("Filtered object walk with filterspec 'tree:1'.\n")); + parse_list_objects_filter(&filter_options, "tree:1"); + + traverse_commit_list_filtered(&filter_options, rev, + walken_show_commit, walken_show_object, NULL, NULL); + } +---- + +`struct list_objects_filter_options` is usually built directly from a command +line argument, so the module provides an easy way to build one from a string. +Even though we aren't taking user input right now, we can still build one with +a hardcoded string using `parse_list_objects_filter()`. + +With the filter spec "tree:1", we are expecting to see _only_ the root tree for +each commit; therefore, the tree object count should be less than or equal to +the number of commits. (For an example of why that's true: `git commit --revert` +points to the same tree object as its grandparent.) + +=== Counting Omitted Objects + +We also have the capability to enumerate all objects which were omitted by a +filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking +`traverse_commit_list_filtered()` to populate the `omitted` list means that our +revision walk does not perform any better than an unfiltered revision walk; all +reachable objects are walked in order to populate the list. + +First, add the `struct oidset` and related items we will use to iterate it: + +---- +static void walken_object_walk( + ... + + struct oidset omitted; + struct oidset_iter oit; + struct object_id *oid = NULL; + int omitted_count = 0; + oidset_init(&omitted, 0); + + ... +---- + +Modify the call to `traverse_commit_list_filtered()` to include your `omitted` +object: + +---- + ... + + traverse_commit_list_filtered(&filter_options, rev, + walken_show_commit, walken_show_object, NULL, &omitted); + + ... +---- + +Then, after your traversal, the `oidset` traversal is pretty straightforward. +Count all the objects within and modify the print statement: + +---- + /* Count the omitted objects. */ + oidset_iter_init(&omitted, &oit); + + while ((oid = oidset_iter_next(&oit))) + omitted_count++; + + printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n", + commit_count, blob_count, tag_count, tree_count, omitted_count); +---- + +By running your walk with and without the filter, you should find that the total +object count in each case is identical. You can also time each invocation of +the `walken` subcommand, with and without `omitted` being passed in, to confirm +to yourself the runtime impact of tracking all omitted objects. + +=== Changing the Order + +Finally, let's demonstrate that you can also reorder walks of all objects, not +just walks of commits. First, we'll make our handlers chattier - modify +`walken_show_commit()` and `walken_show_object()` to print the object as they +go: + +---- +static void walken_show_commit(struct commit *cmt, void *buf) +{ + trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); + commit_count++; +} + +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); + + ... +} +---- + +NOTE: Since we will be examining this output directly as humans, we'll use +`trace_printf()` here. Additionally, since this change introduces a significant +number of printed lines, using `trace_printf()` will allow us to easily silence +those lines without having to recompile. + +(Leave the counter increment logic in place.) + +With only that change, run again (but save yourself some scrollback): + +---- +$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10 +---- + +Take a look at the top commit with `git show` and the object ID you printed; it +should be the same as the output of `git show HEAD`. + +Next, let's change a setting on our `struct rev_info` within +`walken_object_walk()`. Find where you're changing the other settings on `rev`, +such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the +`reverse` setting at the bottom: + +---- + ... + + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + rev->reverse = 1; + + ... +---- + +Now, run again, but this time, let's grab the last handful of objects instead +of the first handful: + +---- +$ make +$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10 +---- + +The last commit object given should have the same OID as the one we saw at the +top before, and running `git show <oid>` with that OID should give you again +the same results as `git show HEAD`. Furthermore, if you run and examine the +first ten lines again (with `head` instead of `tail` like we did before applying +the `reverse` setting), you should see that now the first commit printed is the +initial commit, `e83c5163`. + +== Wrapping Up + +Let's review. In this tutorial, we: + +- Built a commit walk from the ground up +- Enabled a grep filter for that commit walk +- Changed the sort order of that filtered commit walk +- Built an object walk (tags, commits, trees, and blobs) from the ground up +- Learned how to add a filter-spec to an object walk +- Changed the display order of the filtered object walk

[v3] documentation: add tutorial for revision walking

Commit Message

Comments

Patch