diff mbox series

[01/19] revision.h: avoid bit fields in struct rev_info

Message ID 20190508111249.15262-2-pclouds@gmail.com (mailing list archive)
State New, archived
Headers show
Series Convert revision.c to parseopt part 1/4 | expand

Commit Message

Duy Nguyen May 8, 2019, 11:12 a.m. UTC
Bitfield addresses cannot be passed around in a pointer. This makes it
hard to use parse-options to set/unset them. Turn this struct to
normal integers. This of course increases the size of this struct
multiple times, but since we only have a handful of rev_info variables
around, memory consumption is not at all a concern.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 revision.h | 164 ++++++++++++++++++++++++++---------------------------
 1 file changed, 82 insertions(+), 82 deletions(-)

Comments

Derrick Stolee May 8, 2019, 2:07 p.m. UTC | #1
On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
> Bitfield addresses cannot be passed around in a pointer. This makes it
> hard to use parse-options to set/unset them. Turn this struct to
> normal integers. This of course increases the size of this struct
> multiple times, but since we only have a handful of rev_info variables
> around, memory consumption is not at all a concern.

I think you are right that this memory trade-off shouldn't be a problem.

What worries me instead is that we are using an "internal" data structure
for option parsing. Would it make more sense to create a struct for use
in the parse_opts method and a method that translates those options into
the bitfield in struct rev_info?

Thanks,
-Stolee
Duy Nguyen May 8, 2019, 2:41 p.m. UTC | #2
On Wed, May 8, 2019 at 9:07 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
> > Bitfield addresses cannot be passed around in a pointer. This makes it
> > hard to use parse-options to set/unset them. Turn this struct to
> > normal integers. This of course increases the size of this struct
> > multiple times, but since we only have a handful of rev_info variables
> > around, memory consumption is not at all a concern.
>
> I think you are right that this memory trade-off shouldn't be a problem.
>
> What worries me instead is that we are using an "internal" data structure
> for option parsing. Would it make more sense to create a struct for use
> in the parse_opts method and a method that translates those options into
> the bitfield in struct rev_info?

But we are doing that now (option parsing) using the same data
structure. Why would changing from a custom parser to parse_options()
affect what fields it should or should not touch in rev_info? Genuine
question. Maybe you could elaborate more about "internal". I probably
missed something. Or maybe this is a good opportunity to separate
intermediate option parsing variables from the rest of rev_info?
Derrick Stolee May 8, 2019, 3:52 p.m. UTC | #3
On 5/8/2019 10:41 AM, Duy Nguyen wrote:
> On Wed, May 8, 2019 at 9:07 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
>>> Bitfield addresses cannot be passed around in a pointer. This makes it
>>> hard to use parse-options to set/unset them. Turn this struct to
>>> normal integers. This of course increases the size of this struct
>>> multiple times, but since we only have a handful of rev_info variables
>>> around, memory consumption is not at all a concern.
>>
>> I think you are right that this memory trade-off shouldn't be a problem.
>>
>> What worries me instead is that we are using an "internal" data structure
>> for option parsing. Would it make more sense to create a struct for use
>> in the parse_opts method and a method that translates those options into
>> the bitfield in struct rev_info?
> 
> But we are doing that now (option parsing) using the same data
> structure. Why would changing from a custom parser to parse_options()
> affect what fields it should or should not touch in rev_info? Genuine
> question. Maybe you could elaborate more about "internal". I probably
> missed something. Or maybe this is a good opportunity to separate
> intermediate option parsing variables from the rest of rev_info?

You're right. I was unclear.

rev_info stores a lot of data. Some of the fields are important
in-memory structures that are crucial to the workings of revision.c
and are never used by builtin/rev-list.c. Combining this purpose
with the option parsing seems smelly to me.

Thinking more on it, I would prefer a more invasive change that may
pay off in the long term. These options, along with the "starting list"
values, could be extracted to a 'struct rev_options' that contains these
integers and the commit list. Then your option parsing changes could be
limited to a rev_options struct, which is later inserted into a rev_info
struct during setup_revisions().

Generally, the rev_info struct has too many members and could be split
into smaller pieces according to purpose. I created the topo_walk_info
struct as a way to not make the situation worse, but doesn't fix existing
pain.

My ramblings are mostly complaining about old code that grew organically
across many many quality additions. It is definitely hard to understand
the revision-walking code, and perhaps it would be easier to understand
with a little more structure.

The biggest issue with my suggestion is that it requires changing the
consumers of the options, as they would no longer live directly on the
rev_info struct. That would be a big change, even if it could be done
with string replacement.

Thanks,
-Stolee
Duy Nguyen May 9, 2019, 9:56 a.m. UTC | #4
On Wed, May 8, 2019 at 10:52 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/8/2019 10:41 AM, Duy Nguyen wrote:
> > On Wed, May 8, 2019 at 9:07 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
> >>> Bitfield addresses cannot be passed around in a pointer. This makes it
> >>> hard to use parse-options to set/unset them. Turn this struct to
> >>> normal integers. This of course increases the size of this struct
> >>> multiple times, but since we only have a handful of rev_info variables
> >>> around, memory consumption is not at all a concern.
> >>
> >> I think you are right that this memory trade-off shouldn't be a problem.
> >>
> >> What worries me instead is that we are using an "internal" data structure
> >> for option parsing. Would it make more sense to create a struct for use
> >> in the parse_opts method and a method that translates those options into
> >> the bitfield in struct rev_info?
> >
> > But we are doing that now (option parsing) using the same data
> > structure. Why would changing from a custom parser to parse_options()
> > affect what fields it should or should not touch in rev_info? Genuine
> > question. Maybe you could elaborate more about "internal". I probably
> > missed something. Or maybe this is a good opportunity to separate
> > intermediate option parsing variables from the rest of rev_info?
>
> You're right. I was unclear.
>
> rev_info stores a lot of data. Some of the fields are important
> in-memory structures that are crucial to the workings of revision.c
> and are never used by builtin/rev-list.c. Combining this purpose
> with the option parsing seems smelly to me.
>
> Thinking more on it, I would prefer a more invasive change that may
> pay off in the long term. These options, along with the "starting list"
> values, could be extracted to a 'struct rev_options' that contains these
> integers and the commit list. Then your option parsing changes could be
> limited to a rev_options struct, which is later inserted into a rev_info
> struct during setup_revisions().
>
> Generally, the rev_info struct has too many members and could be split
> into smaller pieces according to purpose. I created the topo_walk_info
> struct as a way to not make the situation worse, but doesn't fix existing
> pain.
>
> My ramblings are mostly complaining about old code that grew organically
> across many many quality additions. It is definitely hard to understand
> the revision-walking code, and perhaps it would be easier to understand
> with a little more structure.

Thanks. This is much better.

> The biggest issue with my suggestion is that it requires changing the
> consumers of the options, as they would no longer live directly on the
> rev_info struct. That would be a big change, even if it could be done
> with string replacement.

I agree rev_info has grown "wild". It's quite ancient code. As you
noted, it's a big change. And since my series is already long (76
patches), I would rather focus on just one thing, rewriting the
parsing code with minimum changes to anything else, preferably retain
the exact same old behavior.

After that work is done (and no regression found), we could focus on
reorganizing rev_info, which could be quite "interesting". Some fields
may be overloaded with different purposes, which I just can't spend
time investigating now. There's also the problem with freeing
resources after rev-list is done, which I think we have ignored so
far.
Derrick Stolee May 9, 2019, 12:11 p.m. UTC | #5
On 5/9/2019 5:56 AM, Duy Nguyen wrote:
> On Wed, May 8, 2019 at 10:52 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> The biggest issue with my suggestion is that it requires changing the
>> consumers of the options, as they would no longer live directly on the
>> rev_info struct. That would be a big change, even if it could be done
>> with string replacement.
> 
> I agree rev_info has grown "wild". It's quite ancient code. As you
> noted, it's a big change. And since my series is already long (76
> patches), I would rather focus on just one thing, rewriting the
> parsing code with minimum changes to anything else, preferably retain
> the exact same old behavior.
> 
> After that work is done (and no regression found), we could focus on
> reorganizing rev_info, which could be quite "interesting". Some fields
> may be overloaded with different purposes, which I just can't spend
> time investigating now. There's also the problem with freeing
> resources after rev-list is done, which I think we have ignored so
> far.

Thanks for humoring me. I agree with your reasoning.

This series looks good to me.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/revision.h b/revision.h
index 4134dc6029..01e4c42274 100644
--- a/revision.h
+++ b/revision.h
@@ -107,95 +107,95 @@  struct rev_info {
 
 	unsigned int early_output;
 
-	unsigned int	ignore_missing:1,
-			ignore_missing_links:1;
+	unsigned int ignore_missing;
+	unsigned int ignore_missing_links;
 
 	/* Traversal flags */
-	unsigned int	dense:1,
-			prune:1,
-			no_walk:2,
-			remove_empty_trees:1,
-			simplify_history:1,
-			topo_order:1,
-			simplify_merges:1,
-			simplify_by_decoration:1,
-			single_worktree:1,
-			tag_objects:1,
-			tree_objects:1,
-			blob_objects:1,
-			verify_objects:1,
-			edge_hint:1,
-			edge_hint_aggressive:1,
-			limited:1,
-			unpacked:1,
-			boundary:2,
-			count:1,
-			left_right:1,
-			left_only:1,
-			right_only:1,
-			rewrite_parents:1,
-			print_parents:1,
-			show_decorations:1,
-			reverse:1,
-			reverse_output_stage:1,
-			cherry_pick:1,
-			cherry_mark:1,
-			bisect:1,
-			ancestry_path:1,
-			first_parent_only:1,
-			line_level_traverse:1,
-			tree_blobs_in_commit_order:1,
-
-			/*
-			 * Blobs are shown without regard for their existence.
-			 * But not so for trees: unless exclude_promisor_objects
-			 * is set and the tree in question is a promisor object;
-			 * OR ignore_missing_links is set, the revision walker
-			 * dies with a "bad tree object HASH" message when
-			 * encountering a missing tree. For callers that can
-			 * handle missing trees and want them to be filterable
-			 * and showable, set this to true. The revision walker
-			 * will filter and show such a missing tree as usual,
-			 * but will not attempt to recurse into this tree
-			 * object.
-			 */
-			do_not_die_on_missing_tree:1,
-
-			/* for internal use only */
-			exclude_promisor_objects:1;
+	unsigned int dense;
+	unsigned int prune;
+	unsigned int no_walk;
+	unsigned int remove_empty_trees;
+	unsigned int simplify_history;
+	unsigned int topo_order;
+	unsigned int simplify_merges;
+	unsigned int simplify_by_decoration;
+	unsigned int single_worktree;
+	unsigned int tag_objects;
+	unsigned int tree_objects;
+	unsigned int blob_objects;
+	unsigned int verify_objects;
+	unsigned int edge_hint;
+	unsigned int edge_hint_aggressive;
+	unsigned int limited;
+	unsigned int unpacked;
+	unsigned int boundary;
+	unsigned int count;
+	unsigned int left_right;
+	unsigned int left_only;
+	unsigned int right_only;
+	unsigned int rewrite_parents;
+	unsigned int print_parents;
+	unsigned int show_decorations;
+	unsigned int reverse;
+	unsigned int reverse_output_stage;
+	unsigned int cherry_pick;
+	unsigned int cherry_mark;
+	unsigned int bisect;
+	unsigned int ancestry_path;
+	unsigned int first_parent_only;
+	unsigned int line_level_traverse;
+	unsigned int tree_blobs_in_commit_order;
+
+	/*
+	 * Blobs are shown without regard for their existence.
+	 * But not so for trees: unless exclude_promisor_objects
+	 * is set and the tree in question is a promisor object;
+	 * OR ignore_missing_links is set, the revision walker
+	 * dies with a "bad tree object HASH" message when
+	 * encountering a missing tree. For callers that can
+	 * handle missing trees and want them to be filterable
+	 * and showable, set this to true. The revision walker
+	 * will filter and show such a missing tree as usual,
+	 * but will not attempt to recurse into this tree
+	 * object.
+	 */
+	unsigned int do_not_die_on_missing_tree;
+
+	/* for internal use only */
+	unsigned int exclude_promisor_objects;
 
 	/* Diff flags */
-	unsigned int	diff:1,
-			full_diff:1,
-			show_root_diff:1,
-			no_commit_id:1,
-			verbose_header:1,
-			ignore_merges:1,
-			combine_merges:1,
-			combined_all_paths:1,
-			dense_combined_merges:1,
-			always_show_header:1;
+	unsigned int diff;
+	unsigned int full_diff;
+	unsigned int show_root_diff;
+	unsigned int no_commit_id;
+	unsigned int verbose_header;
+	unsigned int ignore_merges;
+	unsigned int combine_merges;
+	unsigned int combined_all_paths;
+	unsigned int dense_combined_merges;
+	unsigned int always_show_header;
 
 	/* Format info */
-	unsigned int	shown_one:1,
-			shown_dashes:1,
-			show_merge:1,
-			show_notes:1,
-			show_notes_given:1,
-			show_signature:1,
-			pretty_given:1,
-			abbrev_commit:1,
-			abbrev_commit_given:1,
-			zero_commit:1,
-			use_terminator:1,
-			missing_newline:1,
-			date_mode_explicit:1,
-			preserve_subject:1;
-	unsigned int	disable_stdin:1;
+	unsigned int shown_one;
+	unsigned int shown_dashes;
+	unsigned int show_merge;
+	unsigned int show_notes;
+	unsigned int show_notes_given;
+	unsigned int show_signature;
+	unsigned int pretty_given;
+	unsigned int abbrev_commit;
+	unsigned int abbrev_commit_given;
+	unsigned int zero_commit;
+	unsigned int use_terminator;
+	unsigned int missing_newline;
+	unsigned int date_mode_explicit;
+	unsigned int preserve_subject;
+	unsigned int disable_stdin;
 	/* --show-linear-break */
-	unsigned int	track_linear:1,
-			track_first_time:1,
-			linear:1;
+	unsigned int track_linear;
+	unsigned int track_first_time;
+	unsigned int linear;
 
 	struct date_mode date_mode;
 	int		expand_tabs_in_log; /* unset if negative */