diff mbox series

[4/8] read-cache: add invalidate parameter to remove_marked_cache_entries

Message ID 20181209200449.16342-5-t.gummerer@gmail.com (mailing list archive)
State New, archived
Headers show
Series introduce no-overlay and cached mode in git checkout | expand

Commit Message

Thomas Gummerer Dec. 9, 2018, 8:04 p.m. UTC
When marking cache entries for removal, and later removing them all at
once using 'remove_marked_cache_entries()', cache entries currently
have to be invalidated manually in the cache tree and in the untracked
cache.

Add an invalidate flag to the function.  With the flag set, the
function will take care of invalidating the path in the cache tree and
in the untracked cache.

This will be useful in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

For the two current callsites, unpack-trees seems to do this
invalidation itself internally.  I don't quite understand why we don't
need it in split-index mode though.  I assume it's because the cache
tree in the main index would already have been invalidated?  I didn't
have much time to dig, but couldn't produce any failures with it
either, so I assume not invalidating paths is the right thing to do
here.

 cache.h        | 2 +-
 read-cache.c   | 8 +++++++-
 split-index.c  | 2 +-
 unpack-trees.c | 2 +-
 4 files changed, 10 insertions(+), 4 deletions(-)

Comments

Duy Nguyen Dec. 10, 2018, 4:08 p.m. UTC | #1
On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> When marking cache entries for removal, and later removing them all at
> once using 'remove_marked_cache_entries()', cache entries currently
> have to be invalidated manually in the cache tree and in the untracked
> cache.
>
> Add an invalidate flag to the function.  With the flag set, the
> function will take care of invalidating the path in the cache tree and
> in the untracked cache.
>
> This will be useful in a subsequent commit.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>
> For the two current callsites, unpack-trees seems to do this
> invalidation itself internally.

I'm still a bit scared of this invalidation business in unpack-trees.
The thing is, we handle two separate index_state there, src_index and
result and invalidation has to be done on the right one (because index
extensions are on src_index until the very end of unpack-trees;
invalidating on 'result' would be no-op and wrong).
remove_marked_cache_entries() seems to be called on 'result' while
invalidate_ce_path() is on src_index, hm....

> I don't quite understand why we don't
> need it in split-index mode though.  I assume it's because the cache
> tree in the main index would already have been invalidated?  I didn't
> have much time to dig, but couldn't produce any failures with it
> either, so I assume not invalidating paths is the right thing to do
> here.

Yeah I think it's because cache-tree and untracked cache are already
properly invalidated. This merge base thingy is done when we load the
index files up, not when we write them down. The "front" index may
record that a few paths in the base index are no longer valid and need
to be deleted. But untracked cache and cache-tree both should have
recorded that same info when these paths are marked for delete at
index write time.
Elijah Newren Dec. 10, 2018, 6:09 p.m. UTC | #2
On Mon, Dec 10, 2018 at 8:09 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > When marking cache entries for removal, and later removing them all at
> > once using 'remove_marked_cache_entries()', cache entries currently
> > have to be invalidated manually in the cache tree and in the untracked
> > cache.
> >
> > Add an invalidate flag to the function.  With the flag set, the
> > function will take care of invalidating the path in the cache tree and
> > in the untracked cache.
> >
> > This will be useful in a subsequent commit.
> >
> > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > ---
> >
> > For the two current callsites, unpack-trees seems to do this
> > invalidation itself internally.
>
> I'm still a bit scared of this invalidation business in unpack-trees.
> The thing is, we handle two separate index_state there, src_index and
> result and invalidation has to be done on the right one (because index
> extensions are on src_index until the very end of unpack-trees;
> invalidating on 'result' would be no-op and wrong).
> remove_marked_cache_entries() seems to be called on 'result' while
> invalidate_ce_path() is on src_index, hm....

Is Thomas avoiding problems here simply because merge is the only
caller of unpack_trees with src_index != dst_index?  Or does src_index
== dst_index for checkout not actually help?

If that does help with the checkout case, then allow me to find a
different way to muddy the waters...  I think I might want to make use
of this function in the merge machinery at some point, so I either
need to figure out how to convince you to verify if all this cache
tree invalidation stuff is sane, or somehow figure out all the
cache_tree stuff stuff myself so I can figure out what is right here.
:-)

> > I don't quite understand why we don't
> > need it in split-index mode though.  I assume it's because the cache
> > tree in the main index would already have been invalidated?  I didn't
> > have much time to dig, but couldn't produce any failures with it
> > either, so I assume not invalidating paths is the right thing to do
> > here.
>
> Yeah I think it's because cache-tree and untracked cache are already
> properly invalidated. This merge base thingy is done when we load the
> index files up, not when we write them down. The "front" index may
> record that a few paths in the base index are no longer valid and need
> to be deleted. But untracked cache and cache-tree both should have
> recorded that same info when these paths are marked for delete at
> index write time.
Duy Nguyen Dec. 10, 2018, 6:19 p.m. UTC | #3
On Mon, Dec 10, 2018 at 7:09 PM Elijah Newren <newren@gmail.com> wrote:
> > > For the two current callsites, unpack-trees seems to do this
> > > invalidation itself internally.
> >
> > I'm still a bit scared of this invalidation business in unpack-trees.
> > The thing is, we handle two separate index_state there, src_index and
> > result and invalidation has to be done on the right one (because index
> > extensions are on src_index until the very end of unpack-trees;
> > invalidating on 'result' would be no-op and wrong).
> > remove_marked_cache_entries() seems to be called on 'result' while
> > invalidate_ce_path() is on src_index, hm....
>
> Is Thomas avoiding problems here simply because merge is the only
> caller of unpack_trees with src_index != dst_index?  Or does src_index
> == dst_index for checkout not actually help?

I think it would not help. 'result' is a temporary index where we copy
things to (and it does not have anything from the beginning). If you
invalidate stuff in there, you invalidate nothing, regardless whether
dst_index == src_index.

> If that does help with the checkout case, then allow me to find a
> different way to muddy the waters...  I think I might want to make use
> of this function in the merge machinery at some point, so I either
> need to figure out how to convince you to verify if all this cache
> tree invalidation stuff is sane, or somehow figure out all the
> cache_tree stuff stuff myself so I can figure out what is right here.
> :-)

I'm not the unpack-trees man (I think that would still be Junio). And
I'm not saying it's sane either. I think it's just some leftover
things since Linus split "the index" in unpack-tree operation to
'src', 'result' and 'dst' many years ago and nobody was brave enough
to clean it up (then I piled on with untracked cache and split index,
but I did not see it clearly either). That person could be you ;-)
Elijah Newren Dec. 10, 2018, 6:25 p.m. UTC | #4
On Mon, Dec 10, 2018 at 10:19 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Mon, Dec 10, 2018 at 7:09 PM Elijah Newren <newren@gmail.com> wrote:
> > > > For the two current callsites, unpack-trees seems to do this
> > > > invalidation itself internally.
> > >
> > > I'm still a bit scared of this invalidation business in unpack-trees.
> > > The thing is, we handle two separate index_state there, src_index and
> > > result and invalidation has to be done on the right one (because index
> > > extensions are on src_index until the very end of unpack-trees;
> > > invalidating on 'result' would be no-op and wrong).
> > > remove_marked_cache_entries() seems to be called on 'result' while
> > > invalidate_ce_path() is on src_index, hm....
> >
> > Is Thomas avoiding problems here simply because merge is the only
> > caller of unpack_trees with src_index != dst_index?  Or does src_index
> > == dst_index for checkout not actually help?
>
> I think it would not help. 'result' is a temporary index where we copy
> things to (and it does not have anything from the beginning). If you
> invalidate stuff in there, you invalidate nothing, regardless whether
> dst_index == src_index.
>
> > If that does help with the checkout case, then allow me to find a
> > different way to muddy the waters...  I think I might want to make use
> > of this function in the merge machinery at some point, so I either
> > need to figure out how to convince you to verify if all this cache
> > tree invalidation stuff is sane, or somehow figure out all the
> > cache_tree stuff stuff myself so I can figure out what is right here.
> > :-)
>
> I'm not the unpack-trees man (I think that would still be Junio). And
> I'm not saying it's sane either. I think it's just some leftover
> things since Linus split "the index" in unpack-tree operation to
> 'src', 'result' and 'dst' many years ago and nobody was brave enough
> to clean it up (then I piled on with untracked cache and split index,
> but I did not see it clearly either). That person could be you ;-)

Hmm, might make a good New Year's resolution: Enter the abyss, find
out if one can return from it...  or maybe I could just sanely run
away screaming.  We'll see.
Duy Nguyen Dec. 10, 2018, 6:33 p.m. UTC | #5
On Mon, Dec 10, 2018 at 7:25 PM Elijah Newren <newren@gmail.com> wrote:
> > I'm not the unpack-trees man (I think that would still be Junio). And
> > I'm not saying it's sane either. I think it's just some leftover
> > things since Linus split "the index" in unpack-tree operation to
> > 'src', 'result' and 'dst' many years ago and nobody was brave enough
> > to clean it up (then I piled on with untracked cache and split index,
> > but I did not see it clearly either). That person could be you ;-)
>
> Hmm, might make a good New Year's resolution: Enter the abyss, find
> out if one can return from it...  or maybe I could just sanely run
> away screaming.  We'll see.

I'm getting off topic. But my new years resolution would be optimize
for the case where src_index == dst_index, which is somewhat ironic
because we used to do everything in the same index, but it was a messy
mess and had to be split up.
Elijah Newren Dec. 10, 2018, 6:47 p.m. UTC | #6
On Mon, Dec 10, 2018 at 10:34 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Mon, Dec 10, 2018 at 7:25 PM Elijah Newren <newren@gmail.com> wrote:
> > > I'm not the unpack-trees man (I think that would still be Junio). And
> > > I'm not saying it's sane either. I think it's just some leftover
> > > things since Linus split "the index" in unpack-tree operation to
> > > 'src', 'result' and 'dst' many years ago and nobody was brave enough
> > > to clean it up (then I piled on with untracked cache and split index,
> > > but I did not see it clearly either). That person could be you ;-)
> >
> > Hmm, might make a good New Year's resolution: Enter the abyss, find
> > out if one can return from it...  or maybe I could just sanely run
> > away screaming.  We'll see.
>
> I'm getting off topic. But my new years resolution would be optimize
> for the case where src_index == dst_index, which is somewhat ironic
> because we used to do everything in the same index, but it was a messy
> mess and had to be split up.

Ooh, that sounds cool too.  I look forward to seeing it.
Junio C Hamano Dec. 11, 2018, 2:42 a.m. UTC | #7
Duy Nguyen <pclouds@gmail.com> writes:

> I'm still a bit scared of this invalidation business in unpack-trees.

I too was (and I suspect that I would realize that I still am, if I
take another fresh look at the current code) afraid when I did the
cache-tree work and decided to invalidate it as a whole upfront.

> The thing is, we handle two separate index_state there, src_index and
> result and invalidation has to be done on the right one (because index
> extensions are on src_index until the very end of unpack-trees;
> invalidating on 'result' would be no-op and wrong).
> ...
> Yeah I think it's because cache-tree and untracked cache are already
> properly invalidated. ...
Thomas Gummerer Dec. 11, 2018, 9:59 p.m. UTC | #8
On 12/10, Elijah Newren wrote:
> On Mon, Dec 10, 2018 at 8:09 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Sun, Dec 9, 2018 at 9:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> > >
> > > When marking cache entries for removal, and later removing them all at
> > > once using 'remove_marked_cache_entries()', cache entries currently
> > > have to be invalidated manually in the cache tree and in the untracked
> > > cache.
> > >
> > > Add an invalidate flag to the function.  With the flag set, the
> > > function will take care of invalidating the path in the cache tree and
> > > in the untracked cache.
> > >
> > > This will be useful in a subsequent commit.
> > >
> > > Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> > > ---
> > >
> > > For the two current callsites, unpack-trees seems to do this
> > > invalidation itself internally.
> >
> > I'm still a bit scared of this invalidation business in unpack-trees.
> > The thing is, we handle two separate index_state there, src_index and
> > result and invalidation has to be done on the right one (because index
> > extensions are on src_index until the very end of unpack-trees;
> > invalidating on 'result' would be no-op and wrong).
> > remove_marked_cache_entries() seems to be called on 'result' while
> > invalidate_ce_path() is on src_index, hm....
> 
> Is Thomas avoiding problems here simply because merge is the only
> caller of unpack_trees with src_index != dst_index?  Or does src_index
> == dst_index for checkout not actually help?

I'm trying to avoid problems in this patch by keeping status quo, and
not changing the cache-tree invalidation in any way.  'git checkout --
<pathspec>' doesn't use unpack-trees, so I don't think I have to worry
about src_index vs. dst_index.

In what I was saying above I was merely trying to explain why we don't
need invalidate the cache-tree in the 'remove_marked_cache_entries()'
function.

> If that does help with the checkout case, then allow me to find a
> different way to muddy the waters...  I think I might want to make use
> of this function in the merge machinery at some point, so I either
> need to figure out how to convince you to verify if all this cache
> tree invalidation stuff is sane, or somehow figure out all the
> cache_tree stuff stuff myself so I can figure out what is right here.
> :-)
> 
> > > I don't quite understand why we don't
> > > need it in split-index mode though.  I assume it's because the cache
> > > tree in the main index would already have been invalidated?  I didn't
> > > have much time to dig, but couldn't produce any failures with it
> > > either, so I assume not invalidating paths is the right thing to do
> > > here.
> >
> > Yeah I think it's because cache-tree and untracked cache are already
> > properly invalidated. This merge base thingy is done when we load the
> > index files up, not when we write them down. The "front" index may
> > record that a few paths in the base index are no longer valid and need
> > to be deleted. But untracked cache and cache-tree both should have
> > recorded that same info when these paths are marked for delete at
> > index write time.

Thanks, that makes sense to me.
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index c1c953e810..1deee48f5b 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@  extern void rename_index_entry_at(struct index_state *, int pos, const char *new
 /* Remove entry, return true if there are more entries to go. */
 extern int remove_index_entry_at(struct index_state *, int pos);
 
-extern void remove_marked_cache_entries(struct index_state *istate);
+extern void remove_marked_cache_entries(struct index_state *istate, int invalidate);
 extern int remove_file_from_index(struct index_state *, const char *path);
 #define ADD_CACHE_VERBOSE 1
 #define ADD_CACHE_PRETEND 2
diff --git a/read-cache.c b/read-cache.c
index 4ca81286c0..d86a06acba 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -590,13 +590,19 @@  int remove_index_entry_at(struct index_state *istate, int pos)
  * CE_REMOVE is set in ce_flags.  This is much more effective than
  * calling remove_index_entry_at() for each entry to be removed.
  */
-void remove_marked_cache_entries(struct index_state *istate)
+void remove_marked_cache_entries(struct index_state *istate, int invalidate)
 {
 	struct cache_entry **ce_array = istate->cache;
 	unsigned int i, j;
 
 	for (i = j = 0; i < istate->cache_nr; i++) {
 		if (ce_array[i]->ce_flags & CE_REMOVE) {
+			if (invalidate) {
+				cache_tree_invalidate_path(istate,
+							   ce_array[i]->name);
+				untracked_cache_remove_from_index(istate,
+								  ce_array[i]->name);
+			}
 			remove_name_hash(istate, ce_array[i]);
 			save_or_free_index_entry(istate, ce_array[i]);
 		}
diff --git a/split-index.c b/split-index.c
index 5820412dc5..8aebc3661b 100644
--- a/split-index.c
+++ b/split-index.c
@@ -162,7 +162,7 @@  void merge_base_index(struct index_state *istate)
 	ewah_each_bit(si->replace_bitmap, replace_entry, istate);
 	ewah_each_bit(si->delete_bitmap, mark_entry_for_delete, istate);
 	if (si->nr_deletions)
-		remove_marked_cache_entries(istate);
+		remove_marked_cache_entries(istate, 0);
 
 	for (i = si->nr_replacements; i < si->saved_cache_nr; i++) {
 		if (!ce_namelen(si->saved_cache[i]))
diff --git a/unpack-trees.c b/unpack-trees.c
index e8d1a6ac50..8e6afa924d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -392,7 +392,7 @@  static int check_updates(struct unpack_trees_options *o)
 				unlink_entry(ce);
 		}
 	}
-	remove_marked_cache_entries(index);
+	remove_marked_cache_entries(index, 0);
 	remove_scheduled_dirs();
 
 	if (should_update_submodules() && o->update && !o->dry_run)