[2/2] dim: Try to gc the rr-cache
diff mbox

Message ID 20170712121224.18522-2-daniel.vetter@ffwll.ch
State New
Headers show

Commit Message

Daniel Vetter July 12, 2017, 12:12 p.m. UTC
The problem is that we have a distributed cache - every committer has
a copy. Which means even just a slight clock skew will make sure that
a naive gc algorithm results in lots of thrashing around.

To fix this add a huge hysteresis: Only add files newer than 1 day,
and only remove them when older than 60 days. As long as people have
reasonable accurate clocks on their machines this should work.

A different problem is that we can't use filesystem timestamps (and
hence can't use git rerere gc): When someone comes back from vacations
and updates git rerere, all the files will have current timestamps,
even when they've been pushed out weeks ago. To fix that, use the git
log to judge old files to remove. Also, remove old files before adding
new ones, to avoid confusion.

Also, we need to teach the cp -r to preserve timestamps, otherwise
this won't work.

v2: Use git log to remove old files.

v3: Remove the debug uncommenting (Sean).

v4: Split out code movement and explain better what's going on (Jani).

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 dim | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Sean Paul July 13, 2017, 8:16 p.m. UTC | #1
On Wed, Jul 12, 2017 at 02:12:24PM +0200, Daniel Vetter wrote:
> The problem is that we have a distributed cache - every committer has
> a copy. Which means even just a slight clock skew will make sure that
> a naive gc algorithm results in lots of thrashing around.
> 
> To fix this add a huge hysteresis: Only add files newer than 1 day,
> and only remove them when older than 60 days. As long as people have
> reasonable accurate clocks on their machines this should work.
> 
> A different problem is that we can't use filesystem timestamps (and
> hence can't use git rerere gc): When someone comes back from vacations
> and updates git rerere, all the files will have current timestamps,
> even when they've been pushed out weeks ago. To fix that, use the git
> log to judge old files to remove. Also, remove old files before adding
> new ones, to avoid confusion.
> 
> Also, we need to teach the cp -r to preserve timestamps, otherwise
> this won't work.
> 
> v2: Use git log to remove old files.
> 
> v3: Remove the debug uncommenting (Sean).
> 
> v4: Split out code movement and explain better what's going on (Jani).

Yeah, much easier to digest with the split.

Reviewed-by: Sean Paul <seanpaul@chromium.org>

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  dim | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/dim b/dim
> index b788edd29653..79d616cbf354 100755
> --- a/dim
> +++ b/dim
> @@ -513,9 +513,15 @@ function commit_rerere_cache
>  
>  		git pull >& /dev/null
>  		rm $(rr_cache_dir)/rr-cache -Rf &> /dev/null || true
> -		cp $(rr_cache_dir)/* rr-cache -r
> +		cp $(rr_cache_dir)/* rr-cache -r --preserve=timestamps
>  		git add ./*.patch >& /dev/null || true
> -		git add rr-cache/* > /dev/null
> +		for file  in $(git ls-files); do
> +			if ! git log --since="60 days ago" --name-only -- $file | grep $file &> /dev/null; then
> +				git rm $file &> /dev/null
> +				echo deleting $file
> +			fi
> +		done
> +		find rr-cache/ -ctime -1 -type f -print0 | xargs -0 git add > /dev/null
>  		git rm rr-cache/rr-cache &> /dev/null || true
>  		if git commit -m "$time: $integration_branch rerere cache update" >& /dev/null; then
>  			echo -n "New commit. "
> -- 
> 2.13.2
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jani Nikula July 14, 2017, 9:57 a.m. UTC | #2
On Thu, 13 Jul 2017, Sean Paul <seanpaul@chromium.org> wrote:
> On Wed, Jul 12, 2017 at 02:12:24PM +0200, Daniel Vetter wrote:
>> The problem is that we have a distributed cache - every committer has
>> a copy. Which means even just a slight clock skew will make sure that
>> a naive gc algorithm results in lots of thrashing around.
>> 
>> To fix this add a huge hysteresis: Only add files newer than 1 day,
>> and only remove them when older than 60 days. As long as people have
>> reasonable accurate clocks on their machines this should work.
>> 
>> A different problem is that we can't use filesystem timestamps (and
>> hence can't use git rerere gc): When someone comes back from vacations
>> and updates git rerere, all the files will have current timestamps,
>> even when they've been pushed out weeks ago. To fix that, use the git
>> log to judge old files to remove. Also, remove old files before adding
>> new ones, to avoid confusion.
>> 
>> Also, we need to teach the cp -r to preserve timestamps, otherwise
>> this won't work.
>> 
>> v2: Use git log to remove old files.
>> 
>> v3: Remove the debug uncommenting (Sean).
>> 
>> v4: Split out code movement and explain better what's going on (Jani).
>
> Yeah, much easier to digest with the split.
>
> Reviewed-by: Sean Paul <seanpaul@chromium.org>

I'll trust that and hope for the best. ;)

BR,
Jani.


>
>> 
>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>> ---
>>  dim | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/dim b/dim
>> index b788edd29653..79d616cbf354 100755
>> --- a/dim
>> +++ b/dim
>> @@ -513,9 +513,15 @@ function commit_rerere_cache
>>  
>>  		git pull >& /dev/null
>>  		rm $(rr_cache_dir)/rr-cache -Rf &> /dev/null || true
>> -		cp $(rr_cache_dir)/* rr-cache -r
>> +		cp $(rr_cache_dir)/* rr-cache -r --preserve=timestamps
>>  		git add ./*.patch >& /dev/null || true
>> -		git add rr-cache/* > /dev/null
>> +		for file  in $(git ls-files); do
>> +			if ! git log --since="60 days ago" --name-only -- $file | grep $file &> /dev/null; then
>> +				git rm $file &> /dev/null
>> +				echo deleting $file
>> +			fi
>> +		done
>> +		find rr-cache/ -ctime -1 -type f -print0 | xargs -0 git add > /dev/null
>>  		git rm rr-cache/rr-cache &> /dev/null || true
>>  		if git commit -m "$time: $integration_branch rerere cache update" >& /dev/null; then
>>  			echo -n "New commit. "
>> -- 
>> 2.13.2
>> 
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter July 14, 2017, 1:46 p.m. UTC | #3
On Fri, Jul 14, 2017 at 12:57:23PM +0300, Jani Nikula wrote:
> On Thu, 13 Jul 2017, Sean Paul <seanpaul@chromium.org> wrote:
> > On Wed, Jul 12, 2017 at 02:12:24PM +0200, Daniel Vetter wrote:
> >> The problem is that we have a distributed cache - every committer has
> >> a copy. Which means even just a slight clock skew will make sure that
> >> a naive gc algorithm results in lots of thrashing around.
> >> 
> >> To fix this add a huge hysteresis: Only add files newer than 1 day,
> >> and only remove them when older than 60 days. As long as people have
> >> reasonable accurate clocks on their machines this should work.
> >> 
> >> A different problem is that we can't use filesystem timestamps (and
> >> hence can't use git rerere gc): When someone comes back from vacations
> >> and updates git rerere, all the files will have current timestamps,
> >> even when they've been pushed out weeks ago. To fix that, use the git
> >> log to judge old files to remove. Also, remove old files before adding
> >> new ones, to avoid confusion.
> >> 
> >> Also, we need to teach the cp -r to preserve timestamps, otherwise
> >> this won't work.
> >> 
> >> v2: Use git log to remove old files.
> >> 
> >> v3: Remove the debug uncommenting (Sean).
> >> 
> >> v4: Split out code movement and explain better what's going on (Jani).
> >
> > Yeah, much easier to digest with the split.
> >
> > Reviewed-by: Sean Paul <seanpaul@chromium.org>
> 
> I'll trust that and hope for the best. ;)

Pushed and asked everyone I managed to ping over irc to upgrade. Let's see
how often we have to push out a revert until the old stuff is dead for
good (I already screwed up twice locally ...).
-Daniel

> 
> BR,
> Jani.
> 
> 
> >
> >> 
> >> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> >> ---
> >>  dim | 10 ++++++++--
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/dim b/dim
> >> index b788edd29653..79d616cbf354 100755
> >> --- a/dim
> >> +++ b/dim
> >> @@ -513,9 +513,15 @@ function commit_rerere_cache
> >>  
> >>  		git pull >& /dev/null
> >>  		rm $(rr_cache_dir)/rr-cache -Rf &> /dev/null || true
> >> -		cp $(rr_cache_dir)/* rr-cache -r
> >> +		cp $(rr_cache_dir)/* rr-cache -r --preserve=timestamps
> >>  		git add ./*.patch >& /dev/null || true
> >> -		git add rr-cache/* > /dev/null
> >> +		for file  in $(git ls-files); do
> >> +			if ! git log --since="60 days ago" --name-only -- $file | grep $file &> /dev/null; then
> >> +				git rm $file &> /dev/null
> >> +				echo deleting $file
> >> +			fi
> >> +		done
> >> +		find rr-cache/ -ctime -1 -type f -print0 | xargs -0 git add > /dev/null
> >>  		git rm rr-cache/rr-cache &> /dev/null || true
> >>  		if git commit -m "$time: $integration_branch rerere cache update" >& /dev/null; then
> >>  			echo -n "New commit. "
> >> -- 
> >> 2.13.2
> >> 
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center

Patch
diff mbox

diff --git a/dim b/dim
index b788edd29653..79d616cbf354 100755
--- a/dim
+++ b/dim
@@ -513,9 +513,15 @@  function commit_rerere_cache
 
 		git pull >& /dev/null
 		rm $(rr_cache_dir)/rr-cache -Rf &> /dev/null || true
-		cp $(rr_cache_dir)/* rr-cache -r
+		cp $(rr_cache_dir)/* rr-cache -r --preserve=timestamps
 		git add ./*.patch >& /dev/null || true
-		git add rr-cache/* > /dev/null
+		for file  in $(git ls-files); do
+			if ! git log --since="60 days ago" --name-only -- $file | grep $file &> /dev/null; then
+				git rm $file &> /dev/null
+				echo deleting $file
+			fi
+		done
+		find rr-cache/ -ctime -1 -type f -print0 | xargs -0 git add > /dev/null
 		git rm rr-cache/rr-cache &> /dev/null || true
 		if git commit -m "$time: $integration_branch rerere cache update" >& /dev/null; then
 			echo -n "New commit. "