[v2,06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list
diff mbox series

Message ID 20181027212930.9303-7-alban.gruin@gmail.com
State New
Headers show
Series
  • sequencer: refactor functions working on a todo_list
Related show

Commit Message

Alban Gruin Oct. 27, 2018, 9:29 p.m. UTC
This refactors sequencer_add_exec_commands() to work on a todo_list to
avoid redundant reads and writes to the disk.

An obvious way to do this would be to insert the `exec' command between
the other commands, and reparse it once this is done.  This is not what
is done here.  Instead, the command is appended to the buffer once, and
a new list of items is created.  Items from the old list are copied to
it, and new `exec' items are appended when necessary.  This eliminates
the need to reparse the todo list, but this also means its buffer cannot
be directly written to the disk, hence todo_list_write_to_disk().

sequencer_add_exec_commands() still reads the todo list from the disk,
as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
todo_list structure, and reparses it at the end.

complete_action() still uses sequencer_add_exec_commands() for now.
This will be changed in a future commit.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 sequencer.c | 69 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 20 deletions(-)

Comments

Phillip Wood Oct. 30, 2018, 4:47 p.m. UTC | #1
On 27/10/2018 22:29, Alban Gruin wrote:
> This refactors sequencer_add_exec_commands() to work on a todo_list to
> avoid redundant reads and writes to the disk.
> 
> An obvious way to do this would be to insert the `exec' command between
> the other commands, and reparse it once this is done.  This is not what
> is done here.  Instead, the command is appended to the buffer once, and
> a new list of items is created.  Items from the old list are copied to
> it, and new `exec' items are appended when necessary.  This eliminates
> the need to reparse the todo list, but this also means its buffer cannot
> be directly written to the disk, hence todo_list_write_to_disk().

I'd reword this slightly, maybe

Instead of just inserting the `exec' command between the other commands, 
and re-parsing the buffer at the end the exec command is appended to the 
buffer once, and a new list of items is created.  Items from the old 
list are copied across and new `exec' items are appended when necessary. 
  This eliminates the need to reparse the buffer, but this also means we 
have to use todo_list_write_to_disk() to write the file.

> sequencer_add_exec_commands() still reads the todo list from the disk,
> as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
> todo_list structure, and reparses it at the end.

I think the saying 'reparses' is confusing as that is what we're trying 
to avoid.

> complete_action() still uses sequencer_add_exec_commands() for now.
> This will be changed in a future commit.
> 
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>   sequencer.c | 69 +++++++++++++++++++++++++++++++++++++----------------
>   1 file changed, 49 insertions(+), 20 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index e12860c047..12a3efeca8 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc, const char **argv,
>   	return 0;
>   }
>   
> +static void todo_list_add_exec_commands(struct todo_list *todo_list,
> +					const char *commands)
> +{
> +	struct strbuf *buf = &todo_list->buf;
> +	const char *old_buf = buf->buf;
> +	size_t commands_len = strlen(commands + strlen("exec ")) - 1;
> +	int i, first = 1, nr = 0, alloc = 0;

Minor nit pick, I think it is clearer if first is initialized just 
before the loop as it is in the deleted code below.

> +	struct todo_item *items = NULL,
> +		base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
> +
> +	strbuf_addstr(buf, commands);
> +	base_item.offset_in_buf = buf->len - commands_len - 1;
> +	base_item.arg = buf->buf + base_item.offset_in_buf;

I think if the user gives --exec more than once on the command line then 
commands will contain more than one exec command so this needs to parse 
commands and create one todo_item for each command.

> +
> +	/*
> +	 * Insert <commands> after every pick. Here, fixup/squash chains
> +	 * are considered part of the pick, so we insert the commands *after*
> +	 * those chains if there are any.
> +	 */
> +	for (i = 0; i < todo_list->nr; i++) {
> +		enum todo_command command = todo_list->items[i].command;
> +		if (todo_list->items[i].arg)
> +			todo_list->items[i].arg = todo_list->items[i].arg - old_buf + buf->buf;
> +
> +		if (command == TODO_PICK && !first) {
> +			ALLOC_GROW(items, nr + 1, alloc);
> +			memcpy(items + nr++, &base_item, sizeof(struct todo_item));

I think it would be clearer to say
	items[nr++] = base_item;
rather than using memcpy. This applies below and to some of the other 
patches as well. Also this needs to loop over all the base_items if the 
user gave --exec more than once on the command line.

Best Wishes

Phillip

> +		}
> +
> +		ALLOC_GROW(items, nr + 1, alloc);
> +		memcpy(items + nr++, todo_list->items + i, sizeof(struct todo_item));
> +		first = 0;
> +	}
> +
> +	/* insert or append final <commands> */
> +	ALLOC_GROW(items, nr + 1, alloc);
> +	memcpy(items + nr++, &base_item, sizeof(struct todo_item));
> +
> +	FREE_AND_NULL(todo_list->items);
> +	todo_list->items = items;
> +	todo_list->nr = nr;
> +	todo_list->alloc = alloc;
> +}
> +
>   /*
>    * Add commands after pick and (series of) squash/fixup commands
>    * in the todo list.
> @@ -4224,10 +4268,7 @@ int sequencer_add_exec_commands(const char *commands)
>   {
>   	const char *todo_file = rebase_path_todo();
>   	struct todo_list todo_list = TODO_LIST_INIT;
> -	struct todo_item *item;
> -	struct strbuf *buf = &todo_list.buf;
> -	size_t offset = 0, commands_len = strlen(commands);
> -	int i, first;
> +	int res;
>   
>   	if (strbuf_read_file(&todo_list.buf, todo_file, 0) < 0)
>   		return error(_("could not read '%s'."), todo_file);
> @@ -4237,23 +4278,11 @@ int sequencer_add_exec_commands(const char *commands)
>   		return error(_("unusable todo list: '%s'"), todo_file);
>   	}
>   
> -	first = 1;
> -	/* insert <commands> before every pick except the first one */
> -	for (item = todo_list.items, i = 0; i < todo_list.nr; i++, item++) {
> -		if (item->command == TODO_PICK && !first) {
> -			strbuf_insert(buf, item->offset_in_buf + offset,
> -				      commands, commands_len);
> -			offset += commands_len;
> -		}
> -		first = 0;
> -	}
> -
> -	/* append final <commands> */
> -	strbuf_add(buf, commands, commands_len);
> -
> -	i = write_message(buf->buf, buf->len, todo_file, 0);
> +	todo_list_add_exec_commands(&todo_list, commands);
> +	res = todo_list_write_to_file(&todo_list, todo_file, NULL, NULL, 0, 0, -1, 0);
>   	todo_list_release(&todo_list);
> -	return i;
> +
> +	return res;
>   }
>   
>   static void todo_list_to_strbuf(struct todo_list *todo_list, struct strbuf *buf,
>
Alban Gruin Nov. 1, 2018, 11:31 p.m. UTC | #2
Le 30/10/2018 à 17:47, Phillip Wood a écrit :
> On 27/10/2018 22:29, Alban Gruin wrote:
>> This refactors sequencer_add_exec_commands() to work on a todo_list to
>> avoid redundant reads and writes to the disk.
>>
>> An obvious way to do this would be to insert the `exec' command between
>> the other commands, and reparse it once this is done.  This is not what
>> is done here.  Instead, the command is appended to the buffer once, and
>> a new list of items is created.  Items from the old list are copied to
>> it, and new `exec' items are appended when necessary.  This eliminates
>> the need to reparse the todo list, but this also means its buffer cannot
>> be directly written to the disk, hence todo_list_write_to_disk().
> 
> I'd reword this slightly, maybe
> 
> Instead of just inserting the `exec' command between the other commands,
> and re-parsing the buffer at the end the exec command is appended to the
> buffer once, and a new list of items is created.  Items from the old
> list are copied across and new `exec' items are appended when necessary.
>  This eliminates the need to reparse the buffer, but this also means we
> have to use todo_list_write_to_disk() to write the file.
> 
>> sequencer_add_exec_commands() still reads the todo list from the disk,
>> as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
>> todo_list structure, and reparses it at the end.
> 
> I think the saying 'reparses' is confusing as that is what we're trying
> to avoid.
> 
>> complete_action() still uses sequencer_add_exec_commands() for now.
>> This will be changed in a future commit.
>>
>> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
>> ---
>>   sequencer.c | 69 +++++++++++++++++++++++++++++++++++++----------------
>>   1 file changed, 49 insertions(+), 20 deletions(-)
>>
>> diff --git a/sequencer.c b/sequencer.c
>> index e12860c047..12a3efeca8 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc,
>> const char **argv,
>>       return 0;
>>   }
>>   +static void todo_list_add_exec_commands(struct todo_list *todo_list,
>> +                    const char *commands)
>> +{
>> +    struct strbuf *buf = &todo_list->buf;
>> +    const char *old_buf = buf->buf;
>> +    size_t commands_len = strlen(commands + strlen("exec ")) - 1;
>> +    int i, first = 1, nr = 0, alloc = 0;
> 
> Minor nit pick, I think it is clearer if first is initialized just
> before the loop as it is in the deleted code below.
> 
>> +    struct todo_item *items = NULL,
>> +        base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>> +
>> +    strbuf_addstr(buf, commands);
>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
> 
> I think if the user gives --exec more than once on the command line then
> commands will contain more than one exec command so this needs to parse
> commands and create one todo_item for each command.
> 

Ouch, you’re right.  Thanks for the heads up.

>> +
>> +    /*
>> +     * Insert <commands> after every pick. Here, fixup/squash chains
>> +     * are considered part of the pick, so we insert the commands
>> *after*
>> +     * those chains if there are any.
>> +     */
>> +    for (i = 0; i < todo_list->nr; i++) {
>> +        enum todo_command command = todo_list->items[i].command;
>> +        if (todo_list->items[i].arg)
>> +            todo_list->items[i].arg = todo_list->items[i].arg -
>> old_buf + buf->buf;
>> +
>> +        if (command == TODO_PICK && !first) {
>> +            ALLOC_GROW(items, nr + 1, alloc);
>> +            memcpy(items + nr++, &base_item, sizeof(struct todo_item));
> 
> I think it would be clearer to say
>     items[nr++] = base_item;
> rather than using memcpy. This applies below and to some of the other
> patches as well. Also this needs to loop over all the base_items if the
> user gave --exec more than once on the command line.
> 

I agree with you, it’s way more readable, IMO.  But for some reason, I
thought it was not possible to assign a struct to another in C.

> Best Wishes
> 
> Phillip
> 

Cheers,
Alban
Phillip Wood Nov. 2, 2018, 10:09 a.m. UTC | #3
Hi Alban

On 01/11/2018 23:31, Alban Gruin wrote:
> Le 30/10/2018 à 17:47, Phillip Wood a écrit :
>> On 27/10/2018 22:29, Alban Gruin wrote:
>>> This refactors sequencer_add_exec_commands() to work on a todo_list to
>>> avoid redundant reads and writes to the disk.
>>>
>>> An obvious way to do this would be to insert the `exec' command between
>>> the other commands, and reparse it once this is done.  This is not what
>>> is done here.  Instead, the command is appended to the buffer once, and
>>> a new list of items is created.  Items from the old list are copied to
>>> it, and new `exec' items are appended when necessary.  This eliminates
>>> the need to reparse the todo list, but this also means its buffer cannot
>>> be directly written to the disk, hence todo_list_write_to_disk().
>>
>> I'd reword this slightly, maybe
>>
>> Instead of just inserting the `exec' command between the other commands,
>> and re-parsing the buffer at the end the exec command is appended to the
>> buffer once, and a new list of items is created.  Items from the old
>> list are copied across and new `exec' items are appended when necessary.
>>  This eliminates the need to reparse the buffer, but this also means we
>> have to use todo_list_write_to_disk() to write the file.
>>
>>> sequencer_add_exec_commands() still reads the todo list from the disk,
>>> as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
>>> todo_list structure, and reparses it at the end.
>>
>> I think the saying 'reparses' is confusing as that is what we're trying
>> to avoid.
>>
>>> complete_action() still uses sequencer_add_exec_commands() for now.
>>> This will be changed in a future commit.
>>>
>>> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
>>> ---
>>>   sequencer.c | 69 +++++++++++++++++++++++++++++++++++++----------------
>>>   1 file changed, 49 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/sequencer.c b/sequencer.c
>>> index e12860c047..12a3efeca8 100644
>>> --- a/sequencer.c
>>> +++ b/sequencer.c
>>> @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc,
>>> const char **argv,
>>>       return 0;
>>>   }
>>>   +static void todo_list_add_exec_commands(struct todo_list *todo_list,
>>> +                    const char *commands)
>>> +{
>>> +    struct strbuf *buf = &todo_list->buf;
>>> +    const char *old_buf = buf->buf;
>>> +    size_t commands_len = strlen(commands + strlen("exec ")) - 1;
>>> +    int i, first = 1, nr = 0, alloc = 0;
>>
>> Minor nit pick, I think it is clearer if first is initialized just
>> before the loop as it is in the deleted code below.
>>
>>> +    struct todo_item *items = NULL,
>>> +        base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>>> +
>>> +    strbuf_addstr(buf, commands);
>>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>
>> I think if the user gives --exec more than once on the command line then
>> commands will contain more than one exec command so this needs to parse
>> commands and create one todo_item for each command.
>>
> 
> Ouch, you’re right.  Thanks for the heads up.

I haven't looked how difficult it would be but it might be best to
change the option parsing to pass an array of strings containing the
exec commands rather than one long string so we can just loop over the
array here.

> 
>>> +
>>> +    /*
>>> +     * Insert <commands> after every pick. Here, fixup/squash chains
>>> +     * are considered part of the pick, so we insert the commands
>>> *after*
>>> +     * those chains if there are any.
>>> +     */
>>> +    for (i = 0; i < todo_list->nr; i++) {
>>> +        enum todo_command command = todo_list->items[i].command;
>>> +        if (todo_list->items[i].arg)
>>> +            todo_list->items[i].arg = todo_list->items[i].arg -
>>> old_buf + buf->buf;
>>> +
>>> +        if (command == TODO_PICK && !first) {
>>> +            ALLOC_GROW(items, nr + 1, alloc);
>>> +            memcpy(items + nr++, &base_item, sizeof(struct todo_item));
>>
>> I think it would be clearer to say
>>     items[nr++] = base_item;
>> rather than using memcpy. This applies below and to some of the other
>> patches as well. Also this needs to loop over all the base_items if the
>> user gave --exec more than once on the command line.
>>
> 
> I agree with you, it’s way more readable, IMO.  But for some reason, I
> thought it was not possible to assign a struct to another in C.

In general one needs to be careful as it is only does a shallow copy but
that is exactly what we want here. Having said that if we have an array
of exec commands to insert then it might be worth sticking with memcpy()
here so the whole array can be copied in one go.

Best Wishes

Phillip

>> Best Wishes
>>
>> Phillip
>>
> 
> Cheers,
> Alban
>
Alban Gruin Nov. 2, 2018, 4:26 p.m. UTC | #4
Hi Phillip,

Le 02/11/2018 à 11:09, Phillip Wood a écrit :
>>>> +    struct todo_item *items = NULL,
>>>> +        base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>>>> +
>>>> +    strbuf_addstr(buf, commands);
>>>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>>>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>>
>>> I think if the user gives --exec more than once on the command line then
>>> commands will contain more than one exec command so this needs to parse
>>> commands and create one todo_item for each command.
>>>
>>
>> Ouch, you’re right.  Thanks for the heads up.
> 
> I haven't looked how difficult it would be but it might be best to
> change the option parsing to pass an array of strings containing the
> exec commands rather than one long string so we can just loop over the
> array here.
> 

It would be the best way to do so.  This string comes from git-rebase.sh
(or builtin/rebase.c) -- they format it this way before invoking
git-rebase--interactive.  So either I modify both of them (for this, I
would need to rebase my branch on master), or I can split this string in
builtin/rebase--interactive.c.  I prefer the first option, but maybe
changing the base of this series will not please Junio.

Cheers,
Alban
Phillip Wood Nov. 2, 2018, 5:09 p.m. UTC | #5
Hi Alban

On 02/11/2018 16:26, Alban Gruin wrote:
> Hi Phillip,
> 
> Le 02/11/2018 à 11:09, Phillip Wood a écrit :
>>>>> +    struct todo_item *items = NULL,
>>>>> +        base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>>>>> +
>>>>> +    strbuf_addstr(buf, commands);
>>>>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>>>>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>>>
>>>> I think if the user gives --exec more than once on the command line then
>>>> commands will contain more than one exec command so this needs to parse
>>>> commands and create one todo_item for each command.
>>>>
>>>
>>> Ouch, you’re right.  Thanks for the heads up.
>>
>> I haven't looked how difficult it would be but it might be best to
>> change the option parsing to pass an array of strings containing the
>> exec commands rather than one long string so we can just loop over the
>> array here.
>>
> 
> It would be the best way to do so.  This string comes from git-rebase.sh
> (or builtin/rebase.c) -- they format it this way before invoking
> git-rebase--interactive.  So either I modify both of them (for this, I
> would need to rebase my branch on master), or I can split this string in
> builtin/rebase--interactive.c.  I prefer the first option, but maybe
> changing the base of this series will not please Junio.

I think in the last 'what's cooking' email Junio said he was going to 
merge all the builtin/rebase.c stuff to master so there may not be a 
problem if you wait a couple of days.

Best Wishes

Phillip

> Cheers,
> Alban
>

Patch
diff mbox series

diff --git a/sequencer.c b/sequencer.c
index e12860c047..12a3efeca8 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4216,6 +4216,50 @@  int sequencer_make_script(FILE *out, int argc, const char **argv,
 	return 0;
 }
 
+static void todo_list_add_exec_commands(struct todo_list *todo_list,
+					const char *commands)
+{
+	struct strbuf *buf = &todo_list->buf;
+	const char *old_buf = buf->buf;
+	size_t commands_len = strlen(commands + strlen("exec ")) - 1;
+	int i, first = 1, nr = 0, alloc = 0;
+	struct todo_item *items = NULL,
+		base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
+
+	strbuf_addstr(buf, commands);
+	base_item.offset_in_buf = buf->len - commands_len - 1;
+	base_item.arg = buf->buf + base_item.offset_in_buf;
+
+	/*
+	 * Insert <commands> after every pick. Here, fixup/squash chains
+	 * are considered part of the pick, so we insert the commands *after*
+	 * those chains if there are any.
+	 */
+	for (i = 0; i < todo_list->nr; i++) {
+		enum todo_command command = todo_list->items[i].command;
+		if (todo_list->items[i].arg)
+			todo_list->items[i].arg = todo_list->items[i].arg - old_buf + buf->buf;
+
+		if (command == TODO_PICK && !first) {
+			ALLOC_GROW(items, nr + 1, alloc);
+			memcpy(items + nr++, &base_item, sizeof(struct todo_item));
+		}
+
+		ALLOC_GROW(items, nr + 1, alloc);
+		memcpy(items + nr++, todo_list->items + i, sizeof(struct todo_item));
+		first = 0;
+	}
+
+	/* insert or append final <commands> */
+	ALLOC_GROW(items, nr + 1, alloc);
+	memcpy(items + nr++, &base_item, sizeof(struct todo_item));
+
+	FREE_AND_NULL(todo_list->items);
+	todo_list->items = items;
+	todo_list->nr = nr;
+	todo_list->alloc = alloc;
+}
+
 /*
  * Add commands after pick and (series of) squash/fixup commands
  * in the todo list.
@@ -4224,10 +4268,7 @@  int sequencer_add_exec_commands(const char *commands)
 {
 	const char *todo_file = rebase_path_todo();
 	struct todo_list todo_list = TODO_LIST_INIT;
-	struct todo_item *item;
-	struct strbuf *buf = &todo_list.buf;
-	size_t offset = 0, commands_len = strlen(commands);
-	int i, first;
+	int res;
 
 	if (strbuf_read_file(&todo_list.buf, todo_file, 0) < 0)
 		return error(_("could not read '%s'."), todo_file);
@@ -4237,23 +4278,11 @@  int sequencer_add_exec_commands(const char *commands)
 		return error(_("unusable todo list: '%s'"), todo_file);
 	}
 
-	first = 1;
-	/* insert <commands> before every pick except the first one */
-	for (item = todo_list.items, i = 0; i < todo_list.nr; i++, item++) {
-		if (item->command == TODO_PICK && !first) {
-			strbuf_insert(buf, item->offset_in_buf + offset,
-				      commands, commands_len);
-			offset += commands_len;
-		}
-		first = 0;
-	}
-
-	/* append final <commands> */
-	strbuf_add(buf, commands, commands_len);
-
-	i = write_message(buf->buf, buf->len, todo_file, 0);
+	todo_list_add_exec_commands(&todo_list, commands);
+	res = todo_list_write_to_file(&todo_list, todo_file, NULL, NULL, 0, 0, -1, 0);
 	todo_list_release(&todo_list);
-	return i;
+
+	return res;
 }
 
 static void todo_list_to_strbuf(struct todo_list *todo_list, struct strbuf *buf,