diff mbox series

git-gui: Perform rescan on window focus-in

Message ID 20190728151726.9188-1-me@yadavpratyush.com
State New, archived
Headers show
Series git-gui: Perform rescan on window focus-in | expand

Commit Message

Pratyush Yadav July 28, 2019, 3:17 p.m. UTC
If any changes are made to the tree while git-gui is open, the user has
to manually rescan to see those changes in the gui. With this change, a
rescan will be performed whenever the window comes in focus, removing
the need for manual rescans in most cases. A manual rescan will still be
needed when something makes changes to the tree while git-gui is still
in focus.

Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
---
 git-gui/git-gui.sh | 1 +
 1 file changed, 1 insertion(+)

Comments

brian m. carlson July 28, 2019, 9:36 p.m. UTC | #1
On 2019-07-28 at 15:17:26, Pratyush Yadav wrote:
> If any changes are made to the tree while git-gui is open, the user has
> to manually rescan to see those changes in the gui. With this change, a
> rescan will be performed whenever the window comes in focus, removing
> the need for manual rescans in most cases. A manual rescan will still be
> needed when something makes changes to the tree while git-gui is still
> in focus.

I don't use git-gui, so I have no opinion on this change either way, but
I do have some questions.

What exactly is involved in a rescan? Is it potentially expensive? If
so, it might be better to leave it explicit, since people can
accidentally give focus to the wrong window (bad click, focus follows
mouse, etc.).

Is there ever a situation in which someone might want to see the old
state? For example, would a rescan change the active commit shown?
Someone might be looking at a particular commit message or object ID for
the current commit; would this interfere with that?

> diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
> index 6de74ce639..8ca2033dc8 100755
> --- a/git-gui/git-gui.sh
> +++ b/git-gui/git-gui.sh
> @@ -3849,6 +3849,7 @@ if {[is_enabled transport]} {
>  }
>  
>  bind .   <Key-F5>     ui_do_rescan
> +bind .   <FocusIn>    do_rescan

What's the difference between these two? Why are we using do_rescan
instead of ui_do_rescan?

>  bind .   <$M1B-Key-r> ui_do_rescan
>  bind .   <$M1B-Key-R> ui_do_rescan
>  bind .   <$M1B-Key-s> do_signoff

The answers to a lot of these questions can go in your commit message to
help reviewers and future readers understand your change better.
Pratyush Yadav July 28, 2019, 9:44 p.m. UTC | #2
Playing with it for some time, I see some problems with the patch.

Firstly, "bind ." binds that event for the root window and all 
subwindows. This means we rescan multiple times on a single FocusIn 
event, once for every window. This can be fixed easily with a check for 
the root window (aka "."). I'll send a v2 patch with this change.

Secondly, when the context menu is opened via right click, and then 
closed, it is recorded as a FocusIn event for our root window, even 
though the context menu is a child of the root window. This triggers a 
rescan. Unfortunately, I could not find a way to go around this behaviour.

This means some unnecessary rescans, but the rescan looks like a pretty 
fast operation even on my slow hard disk, so it should probably not be a 
problem.

If someone knows a way around the second issue, please let me know.

On 28/07/19 8:47 PM, Pratyush Yadav wrote:
> If any changes are made to the tree while git-gui is open, the user has
> to manually rescan to see those changes in the gui. With this change, a
> rescan will be performed whenever the window comes in focus, removing
> the need for manual rescans in most cases. A manual rescan will still be
> needed when something makes changes to the tree while git-gui is still
> in focus.
> 
> Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
> ---
>   git-gui/git-gui.sh | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
> index 6de74ce639..8ca2033dc8 100755
> --- a/git-gui/git-gui.sh
> +++ b/git-gui/git-gui.sh
> @@ -3849,6 +3849,7 @@ if {[is_enabled transport]} {
>   }
>   
>   bind .   <Key-F5>     ui_do_rescan
> +bind .   <FocusIn>    do_rescan
>   bind .   <$M1B-Key-r> ui_do_rescan
>   bind .   <$M1B-Key-R> ui_do_rescan
>   bind .   <$M1B-Key-s> do_signoff
>
Pratyush Yadav July 28, 2019, 10:10 p.m. UTC | #3
Hi Brian,

On 29/07/19 3:06 AM, brian m. carlson wrote:
> On 2019-07-28 at 15:17:26, Pratyush Yadav wrote:
>> If any changes are made to the tree while git-gui is open, the user has
>> to manually rescan to see those changes in the gui. With this change, a
>> rescan will be performed whenever the window comes in focus, removing
>> the need for manual rescans in most cases. A manual rescan will still be
>> needed when something makes changes to the tree while git-gui is still
>> in focus.
> 
> I don't use git-gui, so I have no opinion on this change either way, but
> I do have some questions.
> 
> What exactly is involved in a rescan? Is it potentially expensive? If
> so, it might be better to leave it explicit, since people can
> accidentally give focus to the wrong window (bad click, focus follows
> mouse, etc.).
> 

The function is not documented, and I only started spelunking the code a 
couple days back, so I'll try to answer with what I know. It might not 
be the full picture.

Running git-gui --trace, these commands are executed during a rescan:

/usr/lib/git-core/git-rev-parse --verify HEAD
/usr/lib/git-core/git-update-index -q --unmerged --ignore-missing --refresh

Since I'm not too familiar with the details of these, I'll let you be 
the judge on how expensive these operations are. But I'll add that 
rescans are pretty fast on my relatively slow hard disk.

> Is there ever a situation in which someone might want to see the old
> state? For example, would a rescan change the active commit shown?
> Someone might be looking at a particular commit message or object ID for
> the current commit; would this interfere with that?

At least in my workflow, there is no such situation. I use git-gui to 
look at uncommitted changes, and stage and commit them. To look at older 
commits, I use gitk. And from what I understand, git-gui is not designed 
to browse old commits. In the options menu, it simply opens gitk if you 
click "Visualise master's history". There is no history browsing in 
git-gui itself.

Yes, a rescan will remove the old commit and will show the latest 
changes. This includes the old diff that was there before it was 
committed or removed from outside of git-gui. So this change will 
certainly interfere with the old state.

> 
>> diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
>> index 6de74ce639..8ca2033dc8 100755
>> --- a/git-gui/git-gui.sh
>> +++ b/git-gui/git-gui.sh
>> @@ -3849,6 +3849,7 @@ if {[is_enabled transport]} {
>>   }
>>   
>>   bind .   <Key-F5>     ui_do_rescan
>> +bind .   <FocusIn>    do_rescan
> 
> What's the difference between these two? Why are we using do_rescan
> instead of ui_do_rescan?
> 

ui_do_rescan changes the focus to the first diff. It is executed when 
you press F5 or choose Rescan from the menu. do_rescan does not do that.

Resetting to first diff on focus change will get annoying when you are 
in the middle of looking at some other file. do_rescan just updates the 
software state without changing what file you are looking at or where in 
that file you are looking at.

>>   bind .   <$M1B-Key-r> ui_do_rescan
>>   bind .   <$M1B-Key-R> ui_do_rescan
>>   bind .   <$M1B-Key-s> do_signoff
> 
> The answers to a lot of these questions can go in your commit message to
> help reviewers and future readers understand your change better.
> 

I'm never too sure what I should put in the commit message, so I took 
the conservative route. I'll add more details in the v2 patch.
Pratyush Yadav July 28, 2019, 10:32 p.m. UTC | #4
On 29/07/19 3:40 AM, Pratyush Yadav wrote:
> Hi Brian,
> 
> On 29/07/19 3:06 AM, brian m. carlson wrote:
>> On 2019-07-28 at 15:17:26, Pratyush Yadav wrote:
>>> If any changes are made to the tree while git-gui is open, the user has
>>> to manually rescan to see those changes in the gui. With this change, a
>>> rescan will be performed whenever the window comes in focus, removing
>>> the need for manual rescans in most cases. A manual rescan will still be
>>> needed when something makes changes to the tree while git-gui is still
>>> in focus.
>>
>> I don't use git-gui, so I have no opinion on this change either way, but
>> I do have some questions.
>>
>> What exactly is involved in a rescan? Is it potentially expensive? If
>> so, it might be better to leave it explicit, since people can
>> accidentally give focus to the wrong window (bad click, focus follows
>> mouse, etc.).
>>
> 
> The function is not documented, and I only started spelunking the code a 
> couple days back, so I'll try to answer with what I know. It might not 
> be the full picture.
> 
> Running git-gui --trace, these commands are executed during a rescan:
> 
> /usr/lib/git-core/git-rev-parse --verify HEAD
> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing --refresh
> 
> Since I'm not too familiar with the details of these, I'll let you be 
> the judge on how expensive these operations are. But I'll add that 
> rescans are pretty fast on my relatively slow hard disk.
> 
>> Is there ever a situation in which someone might want to see the old
>> state? For example, would a rescan change the active commit shown?
>> Someone might be looking at a particular commit message or object ID for
>> the current commit; would this interfere with that?
> 
> At least in my workflow, there is no such situation. I use git-gui to 
> look at uncommitted changes, and stage and commit them. To look at older 
> commits, I use gitk. And from what I understand, git-gui is not designed 
> to browse old commits. In the options menu, it simply opens gitk if you 
> click "Visualise master's history". There is no history browsing in 
> git-gui itself.
> 
> Yes, a rescan will remove the old commit and will show the latest 
> changes. This includes the old diff that was there before it was 
> committed or removed from outside of git-gui. So this change will 
> certainly interfere with the old state.

To add to this, not rescanning has unwanted side effects too. Say you 
stage a file, and just as you are about to commit you suddenly remember 
one small thing you forgot. So you add that change, and jump back into 
git-gui. It shows your file as staged, and the unstaged changes section 
is empty. So you commit the change. Hit refresh and your last minute 
change didn't make it in the commit, even though the unstaged changes 
list was empty. This can cause some confusion, especially among newcomers.

 From my perspective, there is more to gain from auto rescanning than 
there is to lose.

> 
>>
>>> diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
>>> index 6de74ce639..8ca2033dc8 100755
>>> --- a/git-gui/git-gui.sh
>>> +++ b/git-gui/git-gui.sh
>>> @@ -3849,6 +3849,7 @@ if {[is_enabled transport]} {
>>>   }
>>>   bind .   <Key-F5>     ui_do_rescan
>>> +bind .   <FocusIn>    do_rescan
>>
>> What's the difference between these two? Why are we using do_rescan
>> instead of ui_do_rescan?
>>
> 
> ui_do_rescan changes the focus to the first diff. It is executed when 
> you press F5 or choose Rescan from the menu. do_rescan does not do that.
> 
> Resetting to first diff on focus change will get annoying when you are 
> in the middle of looking at some other file. do_rescan just updates the 
> software state without changing what file you are looking at or where in 
> that file you are looking at.
> 
>>>   bind .   <$M1B-Key-r> ui_do_rescan
>>>   bind .   <$M1B-Key-R> ui_do_rescan
>>>   bind .   <$M1B-Key-s> do_signoff
>>
>> The answers to a lot of these questions can go in your commit message to
>> help reviewers and future readers understand your change better.
>>
> 
> I'm never too sure what I should put in the commit message, so I took 
> the conservative route. I'll add more details in the v2 patch.
>
brian m. carlson July 28, 2019, 10:49 p.m. UTC | #5
On 2019-07-28 at 22:10:29, Pratyush Yadav wrote:
> The function is not documented, and I only started spelunking the code a
> couple days back, so I'll try to answer with what I know. It might not be
> the full picture.
> 
> Running git-gui --trace, these commands are executed during a rescan:
> 
> /usr/lib/git-core/git-rev-parse --verify HEAD
> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing --refresh
> 
> Since I'm not too familiar with the details of these, I'll let you be the
> judge on how expensive these operations are. But I'll add that rescans are
> pretty fast on my relatively slow hard disk.

These are probably pretty cheap on all but the largest repositories. I
was worried we were enumerating all refs or all history or something
like that.

> ui_do_rescan changes the focus to the first diff. It is executed when you
> press F5 or choose Rescan from the menu. do_rescan does not do that.
> 
> Resetting to first diff on focus change will get annoying when you are in
> the middle of looking at some other file. do_rescan just updates the
> software state without changing what file you are looking at or where in
> that file you are looking at.

Yeah, this definitely seems like the right move.

> I'm never too sure what I should put in the commit message, so I took the
> conservative route. I'll add more details in the v2 patch.

Great. This sounds like a well-reasoned change. I'll let other folks who
use git-gui more chime in to see what they think as well.
Mark Levedahl July 29, 2019, 2:24 a.m. UTC | #6
On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at 22:10:29, 
Pratyush Yadav wrote:
 >> The function is not documented, and I only started spelunking the code a
 >> couple days back, so I'll try to answer with what I know. It might 
not be
 >> the full picture.
 >>
 >> Running git-gui --trace, these commands are executed during a rescan:
 >>
 >> /usr/lib/git-core/git-rev-parse --verify HEAD
 >> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing 
--refresh
 >>
 >
 > Great. This sounds like a well-reasoned change. I'll let other folks who
 > use git-gui more chime in to see what they think as well.
 >

I'm assuming this does what is currently done by F5.
A simple rescan from git-gui in the git repository takes about 8 seconds 
on my corporate laptop running Windows, making this happen on change of 
window focus is definitely not a friendly change from my view point.

Mark
Mark Levedahl July 29, 2019, 2:26 a.m. UTC | #7
On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at 22:10:29, 
Pratyush Yadav wrote:
 >> The function is not documented, and I only started spelunking the code a
 >> couple days back, so I'll try to answer with what I know. It might 
not be
 >> the full picture.
 >>
 >> Running git-gui --trace, these commands are executed during a rescan:
 >>
 >> /usr/lib/git-core/git-rev-parse --verify HEAD
 >> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing 
--refresh
 >>
 >
 > Great. This sounds like a well-reasoned change. I'll let other folks who
 > use git-gui more chime in to see what they think as well.
 >

I'm assuming this does what is currently done by F5.
A simple rescan from git-gui in the git repository takes about 8 seconds 
on my corporate laptop running Windows, making this happen on change of 
window focus is definitely not a friendly change from my view point.

Mark
Mark Levedahl July 29, 2019, 2:28 a.m. UTC | #8
On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at 22:10:29, 
Pratyush Yadav wrote:
 >> The function is not documented, and I only started spelunking the code a
 >> couple days back, so I'll try to answer with what I know. It might 
not be
 >> the full picture.
 >>
 >> Running git-gui --trace, these commands are executed during a rescan:
 >>
 >> /usr/lib/git-core/git-rev-parse --verify HEAD
 >> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing 
--refresh
 >>
 >
 > Great. This sounds like a well-reasoned change. I'll let other folks who
 > use git-gui more chime in to see what they think as well.
 >

I'm assuming this does what is currently done by F5.
A simple rescan from git-gui in the git repository takes about 8 seconds 
on my corporate laptop running Windows, making this happen on change of 
window focus is definitely not a friendly change from my view point.

Mark
Junio C Hamano July 29, 2019, 5:09 a.m. UTC | #9
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> These are probably pretty cheap on all but the largest repositories. I
> was worried we were enumerating all refs or all history or something
> like that.
>
>> ui_do_rescan changes the focus to the first diff. It is executed when you
>> press F5 or choose Rescan from the menu. do_rescan does not do that.
>> 
>> Resetting to first diff on focus change will get annoying when you are in
>> the middle of looking at some other file. do_rescan just updates the
>> software state without changing what file you are looking at or where in
>> that file you are looking at.
>
> Yeah, this definitely seems like the right move.

"Right move" in the sense that it would try not to change what is
being shown too much.  Rescan will still not be without cost, so it
will be annoying if it happens when the user did not make any
change.

And it is annoying even more, if the user did make change in another
window.  You may make a change perhaps from the command line, write
a short e-mail about it to let others know in your MUA, and then
switch the focus back to git-gui to continue working.  Refreshing
upon git-gui getting focus is no better than manually pressing F5 or
whatever at that point.  It is too late at that point for spending
extra cycles without being asked without getting annoying to the user.

The right time to spend cycles (in the background) in the above
sample sequence is immediately after you made a change and switch to
your MUA---while you are typing a few paragraphs, you would not mind
git-gui spending seconds to repaint.

So probably a more productive use of our time, if we were to futz
with git-gui, would be to figure out how git-gui can have an
background idle process that notices a change in the repository and
refreshes but only when you are *not* interacting with it (if it
does things while you are interacting with it, it would become
annoying and distracting), I would guess.  Just when you come back
is the worst, and the most annoying, time to auto-refresh.
Pratyush Yadav July 29, 2019, 8:15 a.m. UTC | #10
On 29/07/19 7:58 AM, Mark Levedahl wrote:
> On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at 22:10:29, 
> Pratyush Yadav wrote:
>  >> The function is not documented, and I only started spelunking the 
> code a
>  >> couple days back, so I'll try to answer with what I know. It might 
> not be
>  >> the full picture.
>  >>
>  >> Running git-gui --trace, these commands are executed during a rescan:
>  >>
>  >> /usr/lib/git-core/git-rev-parse --verify HEAD
>  >> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing 
> --refresh
>  >>
>  >
>  > Great. This sounds like a well-reasoned change. I'll let other folks who
>  > use git-gui more chime in to see what they think as well.
>  >
> 
> I'm assuming this does what is currently done by F5.
> A simple rescan from git-gui in the git repository takes about 8 seconds 
> on my corporate laptop running Windows, making this happen on change of 
> window focus is definitely not a friendly change from my view point.
> 

This is a Windows problem maybe? On my Linux machine with an almost dead 
hard drive, it takes under 10ms to do a refresh on the git repository 
(which has about 56,000 commits).

Can you check what takes so long? I don't use Windows or I'd do this myself.

Go to git-gui.sh and add some prints to the rescan procedure (line 1450) 
to trace which operation takes time. It is not a shell script, but a tcl 
script, so you'd need to add "puts <your_string_here>" just to have an 
approximate idea of what takes so long.

Can someone else who uses git-gui comment on the time taken to rescan on 
the git repo (on Windows, Linux, or Mac)?

You can use this to see how long it takes to do a rescan. It will print 
the time taken to stdout:

-- >8 --
diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 8ca2033dc8..7f2962f060 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -2360,7 +2360,11 @@ proc do_rescan {} {
  }

  proc ui_do_rescan {} {
+	set t [clock milliseconds]
  	rescan {force_first_diff ui_ready}
+	set t2 [clock milliseconds]
+
+	puts "Time elapsed in ui_do_rescan = [expr $t2 - $t]ms"
  }

  proc do_commit {} {
-- >8 --
Pratyush Yadav July 29, 2019, 8:44 a.m. UTC | #11
On 29/07/19 10:39 AM, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
>> These are probably pretty cheap on all but the largest repositories. I
>> was worried we were enumerating all refs or all history or something
>> like that.
>>
>>> ui_do_rescan changes the focus to the first diff. It is executed when you
>>> press F5 or choose Rescan from the menu. do_rescan does not do that.
>>>
>>> Resetting to first diff on focus change will get annoying when you are in
>>> the middle of looking at some other file. do_rescan just updates the
>>> software state without changing what file you are looking at or where in
>>> that file you are looking at.
>>
>> Yeah, this definitely seems like the right move.
> 
> "Right move" in the sense that it would try not to change what is
> being shown too much.  Rescan will still not be without cost, so it
> will be annoying if it happens when the user did not make any
> change.
> 
> And it is annoying even more, if the user did make change in another
> window.  You may make a change perhaps from the command line, write
> a short e-mail about it to let others know in your MUA, and then
> switch the focus back to git-gui to continue working.  Refreshing
> upon git-gui getting focus is no better than manually pressing F5 or
> whatever at that point.  It is too late at that point for spending
> extra cycles without being asked without getting annoying to the user.

On my system with a not-so-fast hard disk, it takes under 10ms to do a 
rescan (4 to 7ms usually). Using it for some time, I never felt any 
latency or lag. Maybe it will cause problems on other systems, but I'd 
like to hear from more people about it.

Also, why is it no better than manually refreshing? Manually refreshing 
gets annoying real quick (and that's why I sent this change in the first 
place). Not having to hit refresh every time you make any change is 
pretty sweet.

Yes it maybe doesn't matter as much when you open git-gui _after_ you're 
done making changes, and now only have to write the commit message. But 
if you keep it open while you are editing code, and then when you are 
done, you switch to it to commit, auto refresh is a great quality of 
life improvement.

Again, on my system at least, there is no noticeable latency when 
switching focus. How about making rescan on focus optional? The user can 
disable it if they feel like it is getting in their way.

> 
> The right time to spend cycles (in the background) in the above
> sample sequence is immediately after you made a change and switch to
> your MUA---while you are typing a few paragraphs, you would not mind
> git-gui spending seconds to repaint.

FYI, we are not just repainting, we are also syncing the repo state with 
the in-memory state when doing a rescan.

> 
> So probably a more productive use of our time, if we were to futz
> with git-gui, would be to figure out how git-gui can have an
> background idle process that notices a change in the repository and
> refreshes but only when you are *not* interacting with it (if it
> does things while you are interacting with it, it would become
> annoying and distracting), I would guess.  Just when you come back
> is the worst, and the most annoying, time to auto-refresh.
> 

So maybe something like inotify? tcl-inotify [0] (a tcl extension) can 
be used for it I suppose, though I haven't looked into it much yet.

My idea with this change was to keep it as minimal as possible. Doing a 
rescan on focus seems like a good compromise to me. I can look into 
inotify, but I'm not too aware of the performance impact it can have.

Also, do you know how editors like Atom or VSCode refresh when there is 
a change to the tree? I've seen them auto-update the GUI when you make 
any changes from outside of them.

I will look into tcl-inotify if there is a consensus that it is the 
better way to go about this. It seems like a significant time 
investment, and I'd rather not spend a lot of time doing something that 
would get rejected instantly.

[0] http://tcl-inotify.sourceforge.net/
Johannes Schindelin July 31, 2019, 7:42 p.m. UTC | #12
Hi,

On Mon, 29 Jul 2019, Pratyush Yadav wrote:

> On 29/07/19 7:58 AM, Mark Levedahl wrote:
> > On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at
> > 22:10:29, Pratyush Yadav wrote:
> > > > The function is not documented, and I only started spelunking
> > > > the code a couple days back, so I'll try to answer with what I
> > > > know. It might not be the full picture.
> > > >
> > > > Running git-gui --trace, these commands are executed during a rescan:
> > > >
> > > > /usr/lib/git-core/git-rev-parse --verify HEAD
> > > > /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing --refresh
> > > >
> > >
> > > Great. This sounds like a well-reasoned change. I'll let other folks who
> > > use git-gui more chime in to see what they think as well.
> > >
> >
> > I'm assuming this does what is currently done by F5.
> > A simple rescan from git-gui in the git repository takes about 8 seconds on
> > my corporate laptop running Windows, making this happen on change of window
> > focus is definitely not a friendly change from my view point.
> >
>
> This is a Windows problem maybe? On my Linux machine with an almost dead hard
> drive, it takes under 10ms to do a refresh on the git repository (which has
> about 56,000 commits).

I would be _extremely_ cautious to base an argument on one particular
setup, using on particular hardware with one particular OS and one
particular repository.

When it comes to repositories that are worked on actively, git.git is
not actually a representative example, it is way smaller than what users
deal with.

You might be one of those developers privileged enough to have a fast
computer. Trying to extrapolate from such a vantage point does the rest
of us Git users a big disservice.

At this point, I am gently inclined against the presented approach, in
particular given that even context menus reportedly trigger the re-scan
(which I suspect might actually be a Linux-only issue, as context menus
are top-level windows on X11, at least if I remember correctly, and I
also seem to remember that they are dependent windows on Aqua and Win32,
just to add yet another argument against overfitting considerations onto
a single, specific setup).

Ciao,
Johannes
Pratyush Yadav Aug. 1, 2019, 9:52 p.m. UTC | #13
+Junio

On 8/1/19 1:12 AM, Johannes Schindelin wrote:
> Hi,
> 
> On Mon, 29 Jul 2019, Pratyush Yadav wrote:
> 
>> On 29/07/19 7:58 AM, Mark Levedahl wrote:
>>> On 7/28/19 6:49 PM, brian m. carlson wrote:> On 2019-07-28 at
>>> 22:10:29, Pratyush Yadav wrote:
>>>>> The function is not documented, and I only started spelunking
>>>>> the code a couple days back, so I'll try to answer with what I
>>>>> know. It might not be the full picture.
>>>>>
>>>>> Running git-gui --trace, these commands are executed during a rescan:
>>>>>
>>>>> /usr/lib/git-core/git-rev-parse --verify HEAD
>>>>> /usr/lib/git-core/git-update-index -q --unmerged --ignore-missing --refresh
>>>>>
>>>>
>>>> Great. This sounds like a well-reasoned change. I'll let other folks who
>>>> use git-gui more chime in to see what they think as well.
>>>>
>>>
>>> I'm assuming this does what is currently done by F5.
>>> A simple rescan from git-gui in the git repository takes about 8 seconds on
>>> my corporate laptop running Windows, making this happen on change of window
>>> focus is definitely not a friendly change from my view point.
>>>
>>
>> This is a Windows problem maybe? On my Linux machine with an almost dead hard
>> drive, it takes under 10ms to do a refresh on the git repository (which has
>> about 56,000 commits).
> 
> I would be _extremely_ cautious to base an argument on one particular
> setup, using on particular hardware with one particular OS and one
> particular repository.
> 

Agreed. That's why I asked for benchmarks from other people. 
Unfortunately, no one replied.

I am worried about exactly this problem that other users will have 
performance problems. I usually reserve judgment till I see some actual 
benchmarks, but since in this case we aren't getting any, it is probably 
better to err on the side of caution.

> When it comes to repositories that are worked on actively, git.git is
> not actually a representative example, it is way smaller than what users
> deal with.

Out of curiosity, what would you consider large enough? The Linux kernel 
(855,753 commits as of writing this)?

> 
> You might be one of those developers privileged enough to have a fast
> computer. Trying to extrapolate from such a vantage point does the rest
> of us Git users a big disservice.

Yes I have a pretty powerful machine in the processor department, but I 
have a rather slow hard disk. I assumed most people would have faster 
hard disks than me, but in hindsight, that might not be a good 
assumption to make.

> 
> At this point, I am gently inclined against the presented approach, in
> particular given that even context menus reportedly trigger the re-scan
> (which I suspect might actually be a Linux-only issue, as context menus
> are top-level windows on X11, at least if I remember correctly, and I
> also seem to remember that they are dependent windows on Aqua and Win32,
> just to add yet another argument against overfitting considerations onto
> a single, specific setup).

All right, the patch in its current state can't fly. So what is the 
correct way to do this? I see the following options:

1. Add this as an option that is disabled by default, but people who 
don't mind it can enable it. This is the easiest to implement. But I 
leave it to you and Junio (and anyone else who wants to pitch in :)) to 
decide if it is a good idea.

2. Watch all files for changes. Atom does this for their git gui [0]. We 
can probably use watchman [1] for this, but this adds another external 
dependency.

3. Leave this feature out. I of course don't like this option very much, 
and will probably have to run a fork, but if it is better for the 
project, it is better for the project.

Which option do you want to go with?

Naturally, my favourite option is number 1 because it will be the 
easiest to implement ;)

[0] 
https://discuss.atom.io/t/how-does-atom-refresh-git-state-when-some-external-editor-changes-it/67210
[1] https://github.com/facebook/watchman
Johannes Schindelin Aug. 2, 2019, 12:39 p.m. UTC | #14
Hi,

On Fri, 2 Aug 2019, Pratyush Yadav wrote:

> On 8/1/19 1:12 AM, Johannes Schindelin wrote:
> >
> > I would be _extremely_ cautious to base an argument on one
> > particular setup, using on particular hardware with one particular
> > OS and one particular repository.
> >
>
> Agreed. That's why I asked for benchmarks from other people.
> Unfortunately, no one replied.

What stops _you_ from performing more tests yourself? There are tons of
real-world repositories out there, we even talk about options for large
repositories to test with in Git for Windows' Contributing Guidelines:
https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#performance-tests

> I am worried about exactly this problem that other users will have
> performance problems. I usually reserve judgment till I see some
> actual benchmarks, but since in this case we aren't getting any, it is
> probably better to err on the side of caution.

Well, you proposed it without testing it first e.g. on Kotlin or even
Linux.  Personally, I would have chosen Chromium, as it is a pretty big
one.

Maybe you want to do that now.

> > When it comes to repositories that are worked on actively, git.git is
> > not actually a representative example, it is way smaller than what users
> > deal with.
>
> Out of curiosity, what would you consider large enough? The Linux kernel
> (855,753 commits as of writing this)?

As your patch is about refreshing the Git GUI, the number of Git commits
is irrelevant. The number of worktree files is relevant.

Unfortunately, you cannot search for this dimension on GitHub, but you
can search for "repository size", which is at least loosely correlated:

https://github.com/search?utf8=%E2%9C%93&q=size%3A%3E5000000&type=Repositories&ref=advsearch&l=&l=

Personally, I would probably have chosen either of these, for testing:

* https://github.com/chromium/chromium
* https://github.com/WebKit/webkit
* https://github.com/raspberrypi/firmware
* https://github.com/MicrosoftDocs/azure-docs
* https://github.com/facebookresearch/FAIR-Play

> > At this point, I am gently inclined against the presented approach, in
> > particular given that even context menus reportedly trigger the re-scan
> > (which I suspect might actually be a Linux-only issue, as context menus
> > are top-level windows on X11, at least if I remember correctly, and I
> > also seem to remember that they are dependent windows on Aqua and Win32,
> > just to add yet another argument against overfitting considerations onto
> > a single, specific setup).
>
> All right, the patch in its current state can't fly. So what is the correct
> way to do this? I see the following options:
>
> 1. Add this as an option that is disabled by default, but people who don't
> mind it can enable it. This is the easiest to implement. But I leave it to you
> and Junio (and anyone else who wants to pitch in :)) to decide if it is a good
> idea.

That would probably be much easier to get accepted.

> 2. Watch all files for changes. Atom does this for their git gui [0]. We can
> probably use watchman [1] for this, but this adds another external dependency.

I am currently looking at watchman, and it seems that it has its own
performance issues in big repositories (for which it is actually most
relevant). Besides, Windows support is kinda flaky, so I would rule this
out (Git is supported on many more platforms than watchman supports).

Besides, what your patch wants to do is not to know when things have
changed. Your patch wants to refresh the UI at opportune moments, and it
is unclear how watchman could help decide when to refresh.

> 3. Leave this feature out. I of course don't like this option very much, and
> will probably have to run a fork, but if it is better for the project, it is
> better for the project.

That would indeed be the safest.

I wonder, however, whether you can think of a better method to figure
out when to auto-refresh. Focus seems to be a low-hanging fruit, but as
you noticed it is not very accurate. Maybe if you combine it with a
timeout? Or maybe you can detect idle time in Tcl/Tk?

Ciao,
Johannes
Junio C Hamano Aug. 2, 2019, 4:47 p.m. UTC | #15
Pratyush Yadav <me@yadavpratyush.com> writes:

> +Junio

I do not have a strong opinion on this one---a Meh by default means
a moderately strong preference for status-quo.

> All right, the patch in its current state can't fly. So what is the
> correct way to do this? I see the following options:
>
> 1. Add this as an option that is disabled by default, but people who
> don't mind it can enable it. This is the easiest to implement. But I
> leave it to you and Junio (and anyone else who wants to pitch in :))
> to decide if it is a good idea.

I think this is a good first step.  As I said already, I am not
convinced that "focus in" is a good heuristics for triggering auto
rescan, and I suspect that you or others may come up with and
replace it with a better heuristic over time.  During that
experiment, it would be better to allow interested others to opt
into the feature to help, while not disturbing ordinary users who
are OK with the current behaviour.
Pratyush Yadav Aug. 2, 2019, 8 p.m. UTC | #16
On 8/2/19 6:09 PM, Johannes Schindelin wrote:
> Hi,
> 
> On Fri, 2 Aug 2019, Pratyush Yadav wrote:
> 
>> On 8/1/19 1:12 AM, Johannes Schindelin wrote:
>>>
>>> I would be _extremely_ cautious to base an argument on one
>>> particular setup, using on particular hardware with one particular
>>> OS and one particular repository.
>>>
>>
>> Agreed. That's why I asked for benchmarks from other people.
>> Unfortunately, no one replied.
> 
> What stops _you_ from performing more tests yourself? There are tons of
> real-world repositories out there, we even talk about options for large
> repositories to test with in Git for Windows' Contributing Guidelines:
> https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#performance-tests

I thought the point was to not base all data off a single setup?

Anyway, I'll see if I can manage to clone the Chromium source (I have 
flaky internet), and I'll run some tests.

Aside: I looked into the resumeable clones discussion in the mailing 
list archives. I don't think the feature made it through. But the work 
seems too complicated for me to look into reviving it. Maybe some other day.

> 
>> I am worried about exactly this problem that other users will have
>> performance problems. I usually reserve judgment till I see some
>> actual benchmarks, but since in this case we aren't getting any, it is
>> probably better to err on the side of caution.
> 
> Well, you proposed it without testing it first e.g. on Kotlin or even
> Linux.  Personally, I would have chosen Chromium, as it is a pretty big
> one.

When I first proposed the patch, I thought the git repo was large 
enough. And testing on it was pretty fast for me.

> 
> Maybe you want to do that now.
> 
>>> When it comes to repositories that are worked on actively, git.git is
>>> not actually a representative example, it is way smaller than what users
>>> deal with.
>>
>> Out of curiosity, what would you consider large enough? The Linux kernel
>> (855,753 commits as of writing this)?
> 
> As your patch is about refreshing the Git GUI, the number of Git commits
> is irrelevant. The number of worktree files is relevant.
> 
> Unfortunately, you cannot search for this dimension on GitHub, but you
> can search for "repository size", which is at least loosely correlated:
> 
> https://github.com/search?utf8=%E2%9C%93&q=size%3A%3E5000000&type=Repositories&ref=advsearch&l=&l=
> 
> Personally, I would probably have chosen either of these, for testing:
> 
> * https://github.com/chromium/chromium
> * https://github.com/WebKit/webkit
> * https://github.com/raspberrypi/firmware
> * https://github.com/MicrosoftDocs/azure-docs
> * https://github.com/facebookresearch/FAIR-Play
> 
>>> At this point, I am gently inclined against the presented approach, in
>>> particular given that even context menus reportedly trigger the re-scan
>>> (which I suspect might actually be a Linux-only issue, as context menus
>>> are top-level windows on X11, at least if I remember correctly, and I
>>> also seem to remember that they are dependent windows on Aqua and Win32,
>>> just to add yet another argument against overfitting considerations onto
>>> a single, specific setup).
>>
>> All right, the patch in its current state can't fly. So what is the correct
>> way to do this? I see the following options:
>>
>> 1. Add this as an option that is disabled by default, but people who don't
>> mind it can enable it. This is the easiest to implement. But I leave it to you
>> and Junio (and anyone else who wants to pitch in :)) to decide if it is a good
>> idea.
> 
> That would probably be much easier to get accepted.
> 
>> 2. Watch all files for changes. Atom does this for their git gui [0]. We can
>> probably use watchman [1] for this, but this adds another external dependency.
> 
> I am currently looking at watchman, and it seems that it has its own
> performance issues in big repositories (for which it is actually most
> relevant). Besides, Windows support is kinda flaky, so I would rule this
> out (Git is supported on many more platforms than watchman supports).
> 
> Besides, what your patch wants to do is not to know when things have
> changed. Your patch wants to refresh the UI at opportune moments, and it
> is unclear how watchman could help decide when to refresh.
>

But we do want to know when things have changed, because that's when we 
are supposed to refresh. The most opportune moment to refresh is right 
when some file changes. The user is very likely to not be looking at git 
gui at that point because they'd probably in their editor, so we get to 
refresh in the background. Watchman lets us do this, but it is just an 
example. Any cross-platform method of watching a directory tree works 
just as well.

Maybe we can use Watchdog [0], which might be better (disclaimer: I 
haven't actually looked into watchdog, I'm just saying it maybe is 
better). It is an added dependency regardless.

A Linux-only alternative is tcl-inotify [1], but I don't know if you 
folks will like a platform dependent solution.

>> 3. Leave this feature out. I of course don't like this option very much, and
>> will probably have to run a fork, but if it is better for the project, it is
>> better for the project.
> 
> That would indeed be the safest.
> 
> I wonder, however, whether you can think of a better method to figure
> out when to auto-refresh. Focus seems to be a low-hanging fruit, but as
> you noticed it is not very accurate. Maybe if you combine it with a
> timeout? Or maybe you can detect idle time in Tcl/Tk?
> 

Hm, I don't see a better alternative than file system watches. Timeouts 
are a heuristic that can potentially be problematic. If you do a refresh 
too frequently, you hog up the user's resources for little benefit. You 
don't refresh frequently enough, then sometimes the user sees an updated 
view, sometimes a not-updated view. This inconsistency, if not fine 
tuned properly, can prove detrimental to UX.

If you are fine with the above inconsistency, refreshing every few 
seconds shouldn't be too difficult.

[0] https://pythonhosted.org/watchdog/
[1] http://tcl-inotify.sourceforge.net/
Pratyush Yadav Aug. 2, 2019, 8:13 p.m. UTC | #17
On 8/2/19 10:17 PM, Junio C Hamano wrote:
> Pratyush Yadav <me@yadavpratyush.com> writes:
> 
>> +Junio
> 
> I do not have a strong opinion on this one---a Meh by default means
> a moderately strong preference for status-quo.
> 
>> All right, the patch in its current state can't fly. So what is the
>> correct way to do this? I see the following options:
>>
>> 1. Add this as an option that is disabled by default, but people who
>> don't mind it can enable it. This is the easiest to implement. But I
>> leave it to you and Junio (and anyone else who wants to pitch in :))
>> to decide if it is a good idea.
> 
> I think this is a good first step.  As I said already, I am not
> convinced that "focus in" is a good heuristics for triggering auto
> rescan, and I suspect that you or others may come up with and
> replace it with a better heuristic over time.  During that
> experiment, it would be better to allow interested others to opt
> into the feature to help, while not disturbing ordinary users who
> are OK with the current behaviour.
> 

All right. I'll look a bit more to see if I can figure something better. 
I suggested using filesystem watches. I'll wait a bit to hear from 
Johannes on what he thinks about it. He doesn't like the idea of using 
Watchman apparently.

Otherwise, I'll send in an updated patch with this feature as opt-in.
Johannes Schindelin Aug. 3, 2019, 8:34 p.m. UTC | #18
Hi,

On Sat, 3 Aug 2019, Pratyush Yadav wrote:

> On 8/2/19 6:09 PM, Johannes Schindelin wrote:
> >
> > On Fri, 2 Aug 2019, Pratyush Yadav wrote:
> >
> > > On 8/1/19 1:12 AM, Johannes Schindelin wrote:
> > > >
> > > > I would be _extremely_ cautious to base an argument on one
> > > > particular setup, using on particular hardware with one
> > > > particular OS and one particular repository.
> > > >
> > >
> > > Agreed. That's why I asked for benchmarks from other people.
> > > Unfortunately, no one replied.
> >
> > What stops _you_ from performing more tests yourself? There are tons
> > of real-world repositories out there, we even talk about options for
> > large repositories to test with in Git for Windows' Contributing
> > Guidelines:
> > https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#performance-tests
>
> I thought the point was to not base all data off a single setup?

You misunderstood what I was saying: a single setup is bad, and you can
make it moderately better by testing _at least_ with a moderately-sized
repository [*1*] in addition to git.git.

So yes, it would still not be enough to test with, say, the git.git
_and_ the Chromium repository _only_ on your setup, but if not even you
can be bothered to test with more than one small repository, how can you
possibly expect anybody else to add more testing?

> [...]
> > I wonder, however, whether you can think of a better method to
> > figure out when to auto-refresh. Focus seems to be a low-hanging
> > fruit, but as you noticed it is not very accurate. Maybe if you
> > combine it with a timeout? Or maybe you can detect idle time in
> > Tcl/Tk?
>
> Hm, I don't see a better alternative than file system watches.
> Timeouts are a heuristic that can potentially be problematic.

Let me stress the fact that I suggested a timeout _in addition_ to the
focus event?

Yes, using a timeout on its own is stupidly problematic. That's why I
did not suggest that.

> If you do a refresh too frequently, you hog up the user's resources
> for little benefit.

Indeed. You want to find a heuristic that catches most of the cases
where files were changed, while at the same time not even _trying_ to
refresh automatically when probably nothing changed.

> You don't refresh frequently enough, then sometimes the user sees an
> updated view, sometimes a not-updated view. This inconsistency, if not
> fine tuned properly, can prove detrimental to UX.

Right, exactly. And of course, since you are not changing your own Git
fork only, but the Git that is the one that most users will use, you
will have to consider carefully in which direction to strike the balance
by default.

I would contend that you should err on the side of _not_ refreshing, as
that would probably affect more users negatively than the opposite.

Ciao,
Johannes

Footnote *1*: I don't expect you to test with the largest repositories,
as you are unlikely to have access to anything approaching the size of
the largest Git repository on the planet:
https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/
Pratyush Yadav Aug. 4, 2019, 12:53 p.m. UTC | #19
On 8/4/19 2:04 AM, Johannes Schindelin wrote:
> Hi,
> 
> On Sat, 3 Aug 2019, Pratyush Yadav wrote:
> 
>> On 8/2/19 6:09 PM, Johannes Schindelin wrote:
>>>
>>> On Fri, 2 Aug 2019, Pratyush Yadav wrote:
>>>
>>>> On 8/1/19 1:12 AM, Johannes Schindelin wrote:
>>>>>
>>>>> I would be _extremely_ cautious to base an argument on one
>>>>> particular setup, using on particular hardware with one
>>>>> particular OS and one particular repository.
>>>>>
>>>>
>>>> Agreed. That's why I asked for benchmarks from other people.
>>>> Unfortunately, no one replied.
>>>
>>> What stops _you_ from performing more tests yourself? There are tons
>>> of real-world repositories out there, we even talk about options for
>>> large repositories to test with in Git for Windows' Contributing
>>> Guidelines:
>>> https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#performance-tests
>>
>> I thought the point was to not base all data off a single setup?
> 
> You misunderstood what I was saying: a single setup is bad, and you can
> make it moderately better by testing _at least_ with a moderately-sized
> repository [*1*] in addition to git.git.
> 
> So yes, it would still not be enough to test with, say, the git.git
> _and_ the Chromium repository _only_ on your setup, but if not even you
> can be bothered to test with more than one small repository, how can you
> possibly expect anybody else to add more testing?

All right, I'll see what repos I can test.

But my internet is pretty slow and unstable, so my clone of the Chromium 
repo failed mid-way multiple times. I assume we need to test on a large 
index, so is it all right if I use t/perf/repos/many-files.sh to 
artificially generate a large repo?

> 
>> [...]
>>> I wonder, however, whether you can think of a better method to
>>> figure out when to auto-refresh. Focus seems to be a low-hanging
>>> fruit, but as you noticed it is not very accurate. Maybe if you
>>> combine it with a timeout? Or maybe you can detect idle time in
>>> Tcl/Tk?
>>
>> Hm, I don't see a better alternative than file system watches.
>> Timeouts are a heuristic that can potentially be problematic.
> 
> Let me stress the fact that I suggested a timeout _in addition_ to the
> focus event?

Oh, my bad. I thought you suggested using timeouts exclusively.

But I'm not sure I understand what you mean by "using timeouts in 
addition to the focus event". My guess is that you mean we should 
activate a refresh-on-focus-in only after git-gui has been out of focus 
for a certain amount of time. Is my guess correct?

> 
> Yes, using a timeout on its own is stupidly problematic. That's why I
> did not suggest that.
> 
>> If you do a refresh too frequently, you hog up the user's resources
>> for little benefit.
> 
> Indeed. You want to find a heuristic that catches most of the cases
> where files were changed, while at the same time not even _trying_ to
> refresh automatically when probably nothing changed.

Like I said before, the best way of doing that that I can see is file 
system watches. But maybe we can get reasonable performance with a 
combination of timeouts and focus events.

> 
>> You don't refresh frequently enough, then sometimes the user sees an
>> updated view, sometimes a not-updated view. This inconsistency, if not
>> fine tuned properly, can prove detrimental to UX.
> 
> Right, exactly. And of course, since you are not changing your own Git
> fork only, but the Git that is the one that most users will use, you
> will have to consider carefully in which direction to strike the balance
> by default.

Yep, that's why I'm engaging you in this conversation :). I'm personally 
pretty happy with this change for my own use, but I'd like everyone 
enjoy it equally.

> 
> I would contend that you should err on the side of _not_ refreshing, as
> that would probably affect more users negatively than the opposite.
> 
> Ciao,
> Johannes
> 
> Footnote *1*: I don't expect you to test with the largest repositories,
> as you are unlikely to have access to anything approaching the size of
> the largest Git repository on the planet:
> https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/
> 

Ah yes, I read about it a while back on Reddit. Having a huge monolithic 
repo sounds backwards to me. Using submodules sounds like a better idea, 
but who am I to judge. They probably have their reasons that I'm not 
aware of.
Johannes Schindelin Aug. 4, 2019, 6:56 p.m. UTC | #20
Hi,

On Sat, 3 Aug 2019, Pratyush Yadav wrote:

> On 8/2/19 10:17 PM, Junio C Hamano wrote:
> > Pratyush Yadav <me@yadavpratyush.com> writes:
> >
> > > All right, the patch in its current state can't fly. So what is
> > > the correct way to do this? I see the following options:
> > >
> > > 1. Add this as an option that is disabled by default, but people
> > > who don't mind it can enable it. This is the easiest to implement.
> > > But I leave it to you and Junio (and anyone else who wants to
> > > pitch in :)) to decide if it is a good idea.
> >
> > I think this is a good first step.  As I said already, I am not
> > convinced that "focus in" is a good heuristics for triggering auto
> > rescan, and I suspect that you or others may come up with and
> > replace it with a better heuristic over time.  During that
> > experiment, it would be better to allow interested others to opt
> > into the feature to help, while not disturbing ordinary users who
> > are OK with the current behaviour.
> >
>
> All right. I'll look a bit more to see if I can figure something
> better. I suggested using filesystem watches. I'll wait a bit to hear
> from Johannes on what he thinks about it. He doesn't like the idea of
> using Watchman apparently.

It's not that I don't like the idea of watchman. What I don't like is
the very limited number of scenarios where you can use watchman. It
essentially boils down to... macOS.

From what I can tell, watchman really works best on macOS, it might work
well on Linux, and it is in an "eternal beta" on Windows because none of
the active developers have access to (or need for support on) Windows.

Also, using a filesystem monitor is quite a lot of overkill here, you
only want to refresh the index smartly and pro-actively. Now you pile
more and more complexity on top of the original patch, for little
obvious benefit (and for added difficulties setting this up, because you
can now no longer make this a default, as you have no idea whether
watchman is installed or not, whether it works as intended or not if it
is installed, and whether users will start to hate you for forcing this
down their throats).

I want you to take a step back, consider what problem you originally
tried to solve, and then come up with the simplest solution you can
possibly come up with. If your solution is not simple, reject it.

I _could_ imagine, for example, that a focus-out event, followed by a
configurable timeout (say, with the default being something like 10
seconds, which seems to me like a sensible minimum number of seconds to
change anything commit-worthy), could be used by a focus-in event to
trigger a refresh (with a message in the status bar what is happening
and why), and that that would strike a sensible balance between benefit
and complexity.

Ciao,
Johannes
Johannes Schindelin Aug. 4, 2019, 7:10 p.m. UTC | #21
Hi,

On Sun, 4 Aug 2019, Pratyush Yadav wrote:

> On 8/4/19 2:04 AM, Johannes Schindelin wrote:
> >
> > On Sat, 3 Aug 2019, Pratyush Yadav wrote:
> >
> > > On 8/2/19 6:09 PM, Johannes Schindelin wrote:
> > > >
> > > > On Fri, 2 Aug 2019, Pratyush Yadav wrote:
> > > >
> > > > > On 8/1/19 1:12 AM, Johannes Schindelin wrote:
> > > > > >
> > > > > > I would be _extremely_ cautious to base an argument on one
> > > > > > particular setup, using on particular hardware with one
> > > > > > particular OS and one particular repository.
> > > > > >
> > > > >
> > > > > Agreed. That's why I asked for benchmarks from other people.
> > > > > Unfortunately, no one replied.
> > > >
> > > > What stops _you_ from performing more tests yourself? There are
> > > > tons of real-world repositories out there, we even talk about
> > > > options for large repositories to test with in Git for Windows'
> > > > Contributing Guidelines:
> > > > https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#performance-tests
> > >
> > > I thought the point was to not base all data off a single setup?
> >
> > You misunderstood what I was saying: a single setup is bad, and you
> > can make it moderately better by testing _at least_ with a
> > moderately-sized repository [*1*] in addition to git.git.
> >
> > So yes, it would still not be enough to test with, say, the git.git
> > _and_ the Chromium repository _only_ on your setup, but if not even
> > you can be bothered to test with more than one small repository, how
> > can you possibly expect anybody else to add more testing?
>
> All right, I'll see what repos I can test.
>
> But my internet is pretty slow and unstable, so my clone of the
> Chromium repo failed mid-way multiple times. I assume we need to test
> on a large index, so is it all right if I use
> t/perf/repos/many-files.sh to artificially generate a large repo?

Why do you ask me for permission to just try this? I feel very
uncomfortable being put in such a position: I am not your manager or
gate-keeper or anything.

> > > [...]
> > > > I wonder, however, whether you can think of a better method to
> > > > figure out when to auto-refresh. Focus seems to be a low-hanging
> > > > fruit, but as you noticed it is not very accurate. Maybe if you
> > > > combine it with a timeout? Or maybe you can detect idle time in
> > > > Tcl/Tk?
> > >
> > > Hm, I don't see a better alternative than file system watches.
> > > Timeouts are a heuristic that can potentially be problematic.
> >
> > Let me stress the fact that I suggested a timeout _in addition_ to the
> > focus event?
>
> Oh, my bad. I thought you suggested using timeouts exclusively.
>
> But I'm not sure I understand what you mean by "using timeouts in addition to
> the focus event". My guess is that you mean we should activate a
> refresh-on-focus-in only after git-gui has been out of focus for a certain
> amount of time. Is my guess correct?

I am _not_ telling you what strategy you should use. You really need to
come up with hypotheses about what tell-tales for committable outside
changes could be easy to detect. This is your patch, and your project.

My suggestion about a time-out was to think a bit further than just mere
Tk-provided events to detect whether the user might have changed
anything outside of Git GUI that might make an automatic refresh
convenient for the user.

I do _not_ want to engage in this project, it is not my pet peeve.

> > Yes, using a timeout on its own is stupidly problematic. That's why I
> > did not suggest that.
> >
> > > If you do a refresh too frequently, you hog up the user's resources
> > > for little benefit.
> >
> > Indeed. You want to find a heuristic that catches most of the cases
> > where files were changed, while at the same time not even _trying_ to
> > refresh automatically when probably nothing changed.
>
> Like I said before, the best way of doing that that I can see is file system
> watches.

That's not a heuristic.

A file system monitor is doing a lot of work in this case, for dubitable
benefit.

Take for example git.git: Let's say that I run a specific test. It
creates a directory under `t/`: the filesystem monitor triggers. It
creates a repository in that directory: the filesystem monitor triggers
_multiple times_. The test then performs all kinds of I/O, eventually
removing the directory when all tests passed.

Note that none of these filesystem changes correspond to anything that
would update _anything in Git GUI during a refresh.

Of course, this is something I did not mention before because I took it
for granted that you would always try to weigh the benefits of your
approach to the worst possible unintended consequences.

> But maybe we can get reasonable performance with a combination of
> timeouts and focus events.

Please note that I would not be surprised if this heuristic _also_
resulted in a lot of bad, unintended consequences. That's for you to
find out.

> > Footnote *1*: I don't expect you to test with the largest repositories,
> > as you are unlikely to have access to anything approaching the size of
> > the largest Git repository on the planet:
> > https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/
> >
>
> Ah yes, I read about it a while back on Reddit. Having a huge monolithic repo
> sounds backwards to me. Using submodules sounds like a better idea, but who am
> I to judge. They probably have their reasons that I'm not aware of.

This statement just sounds to me as if you never have used submodules in
any serious way. My experience is that software developers who tried to
use submodules offer opinions that read very differently from that
paragraph.

Strong opinions usually do not survive contact with open-minded exposure
to reality.

Ciao,
Johannes
Pratyush Yadav Aug. 4, 2019, 8:17 p.m. UTC | #22
Hi Johannes,

I sense that I'm starting to test your patience. So first off, thanks 
for the advice you've given so far. It has been really valuable. Feel 
free to let me know if you don't want me to prod you further about this, 
and I'll stop :).

Also let me know if you don't want me to cc you in other git-gui patches 
I send.


On 8/5/19 12:40 AM, Johannes Schindelin wrote:
> Hi,
> 
> On Sun, 4 Aug 2019, Pratyush Yadav wrote:
> 
>> On 8/4/19 2:04 AM, Johannes Schindelin wrote:
>>>
>>> On Sat, 3 Aug 2019, Pratyush Yadav wrote:
>>>
[...]
>>>
>>> You misunderstood what I was saying: a single setup is bad, and you
>>> can make it moderately better by testing _at least_ with a
>>> moderately-sized repository [*1*] in addition to git.git.
>>>
>>> So yes, it would still not be enough to test with, say, the git.git
>>> _and_ the Chromium repository _only_ on your setup, but if not even
>>> you can be bothered to test with more than one small repository, how
>>> can you possibly expect anybody else to add more testing?
>>
>> All right, I'll see what repos I can test.
>>
>> But my internet is pretty slow and unstable, so my clone of the
>> Chromium repo failed mid-way multiple times. I assume we need to test
>> on a large index, so is it all right if I use
>> t/perf/repos/many-files.sh to artificially generate a large repo?
> 
> Why do you ask me for permission to just try this? I feel very
> uncomfortable being put in such a position: I am not your manager or
> gate-keeper or anything.

I'm not a native speaker, so maybe I worded it incorrectly. What I meant 
by the question was not to ask permission, but to verify that this would 
be a good test for checking performance with a large repo.

So I'll rephrase the question:
Will using t/perf/repos/many-files.sh, instead of something like the 
Chromium repo, be a good enough test of performance of a rescan?

> 
>>>> [...]
>>>>> I wonder, however, whether you can think of a better method to
>>>>> figure out when to auto-refresh. Focus seems to be a low-hanging
>>>>> fruit, but as you noticed it is not very accurate. Maybe if you
>>>>> combine it with a timeout? Or maybe you can detect idle time in
>>>>> Tcl/Tk?
>>>>
>>>> Hm, I don't see a better alternative than file system watches.
>>>> Timeouts are a heuristic that can potentially be problematic.
>>>
>>> Let me stress the fact that I suggested a timeout _in addition_ to the
>>> focus event?
>>
>> Oh, my bad. I thought you suggested using timeouts exclusively.
>>
>> But I'm not sure I understand what you mean by "using timeouts in addition to
>> the focus event". My guess is that you mean we should activate a
>> refresh-on-focus-in only after git-gui has been out of focus for a certain
>> amount of time. Is my guess correct?
> 
> I am _not_ telling you what strategy you should use. You really need to
> come up with hypotheses about what tell-tales for committable outside
> changes could be easy to detect. This is your patch, and your project.

IDEs and some text editors like Atom use file system watches, and that's 
why I think they are a viable approach. But at the same time, these are 
also known to be kind of heavy-weight and slow, and I like git-gui for 
being fast and lightweight, so that might not be a good idea.

> 
> My suggestion about a time-out was to think a bit further than just mere
> Tk-provided events to detect whether the user might have changed
> anything outside of Git GUI that might make an automatic refresh
> convenient for the user.
> 
> I do _not_ want to engage in this project, it is not my pet peeve.
Understandable. We are all volunteers after all. Thanks for your help so 
far :)

> 
>>> Yes, using a timeout on its own is stupidly problematic. That's why I
>>> did not suggest that.
>>>
>>>> If you do a refresh too frequently, you hog up the user's resources
>>>> for little benefit.
>>>
>>> Indeed. You want to find a heuristic that catches most of the cases
>>> where files were changed, while at the same time not even _trying_ to
>>> refresh automatically when probably nothing changed.
>>
>> Like I said before, the best way of doing that that I can see is file system
>> watches.
> 
> That's not a heuristic.
> 
> A file system monitor is doing a lot of work in this case, for dubitable
> benefit.
> 
> Take for example git.git: Let's say that I run a specific test. It
> creates a directory under `t/`: the filesystem monitor triggers. It
> creates a repository in that directory: the filesystem monitor triggers
> _multiple times_. The test then performs all kinds of I/O, eventually
> removing the directory when all tests passed.

The monitor triggers multiple times, but we don't have to do a rescan 
every time. The callback for the trigger would just set a boolean that 
will decide whether a rescan has to be done the next time it comes back 
into focus.

> 
> Note that none of these filesystem changes correspond to anything that
> would update _anything in Git GUI during a refresh.
> 
> Of course, this is something I did not mention before because I took it
> for granted that you would always try to weigh the benefits of your
> approach to the worst possible unintended consequences.

Yes, the specific use case you mentioned would trigger an unnecessary 
refresh, but I don't think many people do these kinds of things very often.

Building the tree will generate a lot of IO, but we will only watch 
files tracked by git, so all that IO would not trigger a refresh at all.

I'm not saying file system watches are the best alternative here, I'm 
just saying it is a pretty good alternative, and the scenario you 
describe doesn't happen often enough to bother most people. One problem 
with them is, of course, added dependencies. And it's a big problem. I'd 
love to have a tcl-only alternative, but there is none at the moment.

I'll experiment with some file system watches, and see how they fare.

> 
>> But maybe we can get reasonable performance with a combination of
>> timeouts and focus events.
> 
> Please note that I would not be surprised if this heuristic _also_
> resulted in a lot of bad, unintended consequences. That's for you to
> find out.

Yes, of course. I was just thinking out loud.

> 
>>> Footnote *1*: I don't expect you to test with the largest repositories,
>>> as you are unlikely to have access to anything approaching the size of
>>> the largest Git repository on the planet:
>>> https://devblogs.microsoft.com/bharry/the-largest-git-repo-on-the-planet/
>>>
>>
>> Ah yes, I read about it a while back on Reddit. Having a huge monolithic repo
>> sounds backwards to me. Using submodules sounds like a better idea, but who am
>> I to judge. They probably have their reasons that I'm not aware of.
> 
> This statement just sounds to me as if you never have used submodules in
> any serious way. My experience is that software developers who tried to
> use submodules offer opinions that read very differently from that
> paragraph.

My last workplace had Linux, U-Boot, ARM Trusted Firmware, and a handful 
more projects under one repo with submodules. I don't remember having 
any problems, and hence why I thought it was a pretty decent idea.

> 
> Strong opinions usually do not survive contact with open-minded exposure
> to reality.

"but who am I to judge. They probably have their reasons that I'm not 
aware of."

I just put my first thoughts out about this, admitting I probably don't 
have the full picture. I see that as being open-minded.
diff mbox series

Patch

diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 6de74ce639..8ca2033dc8 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -3849,6 +3849,7 @@  if {[is_enabled transport]} {
 }
 
 bind .   <Key-F5>     ui_do_rescan
+bind .   <FocusIn>    do_rescan
 bind .   <$M1B-Key-r> ui_do_rescan
 bind .   <$M1B-Key-R> ui_do_rescan
 bind .   <$M1B-Key-s> do_signoff