diff mbox series

[b4] Fix 'LoreSeries::make_fake_am_range' with renamed, then modified file

Message ID 20221107134054.75514-1-levraiphilippeblain@gmail.com (mailing list archive)
State Accepted
Headers show
Series [b4] Fix 'LoreSeries::make_fake_am_range' with renamed, then modified file | expand

Commit Message

Philippe Blain Nov. 7, 2022, 1:40 p.m. UTC
When invoking 'b4 diff', LoreSeries::make_fake_am_range is used to
create the old and new versions of a series in a temporary worktree.

The code is currently unprepared for a series that renames a file
and then modifies it in a subsequent commit.

The first issue is that LoreMessage::get_indexes fails to set
blob_indexes for a renamed file since the 'diff --git' header for such a
change has different file names for the "old" and "new" file, but that
function skips any diff where "old" and "new" do not match. Fix that by
removing that condition.

The second issue is that LoreSeries::make_fake_am_range itself is
unprepared for a series that renames a file and then modifies it in a
separate commit. This is because when processing each commit to create
the "Initial fake commit", the file is added to the index with its new
name when the commit that modifies it is processed. This causes the 'git
am' invocation to fail at the earlier rename patch, because the new file
name is already in the index (since it was created in the "Initial fake
commit").

Fix this by also recording the new file name in the 'blob_indexes'
tuple, in addition to the old file name and hash, and making use of this
new information in 'make_fake_am_range' by explicitely checking for
rename patches when processing each commit to create the "Initial fake
commit", adjusting the 'seenfiles' list accordingly. Note that we must
also adjust the other user of 'blob_indexes', LoreSeries::populate_indexes.

For consistency, also rename local variables:
- in 'get_indexes', rename 'curfile' to 'oldfile'
- in 'populate_indexes', rename 'fn' (filename) and 'bh' (blob hash) to
  'ofn' and 'obh' (old ")
- in 'make_fake_am_range', rename 'fn' and 'fi' (file index) to 'ofn'
  and 'ofi' (idem)

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---

Here is a patch. I was not sure if I should make the same logic changes in
populate_indexes as I do in make_fake_am_range... I tried to read the code
paths where populate_indexes is used to see if the same situation could occur,
but I was not sure. 

 b4/__init__.py | 43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)


base-commit: cb6764b0d024316eb4a8b3bec5988d842af3b76d

Comments

Philippe Blain Nov. 28, 2022, 6:07 p.m. UTC | #1
Hi Konstantin,

Le 2022-11-07 à 08:40, Philippe Blain a écrit :
> When invoking 'b4 diff', LoreSeries::make_fake_am_range is used to
> create the old and new versions of a series in a temporary worktree.
> 
> The code is currently unprepared for a series that renames a file
> and then modifies it in a subsequent commit.
> 
> The first issue is that LoreMessage::get_indexes fails to set
> blob_indexes for a renamed file since the 'diff --git' header for such a
> change has different file names for the "old" and "new" file, but that
> function skips any diff where "old" and "new" do not match. Fix that by
> removing that condition.
> 
> The second issue is that LoreSeries::make_fake_am_range itself is
> unprepared for a series that renames a file and then modifies it in a
> separate commit. This is because when processing each commit to create
> the "Initial fake commit", the file is added to the index with its new
> name when the commit that modifies it is processed. This causes the 'git
> am' invocation to fail at the earlier rename patch, because the new file
> name is already in the index (since it was created in the "Initial fake
> commit").
> 
> Fix this by also recording the new file name in the 'blob_indexes'
> tuple, in addition to the old file name and hash, and making use of this
> new information in 'make_fake_am_range' by explicitely checking for
> rename patches when processing each commit to create the "Initial fake
> commit", adjusting the 'seenfiles' list accordingly. Note that we must
> also adjust the other user of 'blob_indexes', LoreSeries::populate_indexes.
> 
> For consistency, also rename local variables:
> - in 'get_indexes', rename 'curfile' to 'oldfile'
> - in 'populate_indexes', rename 'fn' (filename) and 'bh' (blob hash) to
>   'ofn' and 'obh' (old ")
> - in 'make_fake_am_range', rename 'fn' and 'fi' (file index) to 'ofn'
>   and 'ofi' (idem)
> 
> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
> ---
> 
> Here is a patch. I was not sure if I should make the same logic changes in
> populate_indexes as I do in make_fake_am_range... I tried to read the code
> paths where populate_indexes is used to see if the same situation could occur,
> but I was not sure. 
> 
>  b4/__init__.py | 43 ++++++++++++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 19 deletions(-)
> 
> diff --git a/b4/__init__.py b/b4/__init__.py
> index 4456414..6c66893 100644
> --- a/b4/__init__.py
> +++ b/b4/__init__.py
> @@ -640,16 +640,16 @@ def populate_indexes(self):
>          for lmsg in self.patches[1:]:
>              if lmsg is None or lmsg.blob_indexes is None:
>                  continue
> -            for fn, bh in lmsg.blob_indexes:
> -                if fn in seenfiles:
> +            for ofn, obh, nfn in lmsg.blob_indexes:
> +                if ofn in seenfiles:
>                      # if we have seen this file once already, then it's a repeat patch
>                      # it's no longer going to match current hash
>                      continue
> -                seenfiles.add(fn)
> -                if set(bh) == {'0'}:
> +                seenfiles.add(ofn)
> +                if set(obh) == {'0'}:
>                      # New file, will for sure apply clean
>                      continue
> -                self.indexes.append((fn, bh))
> +                self.indexes.append((ofn, obh))
>  
>      def check_applies_clean(self, gitdir: str, at: Optional[str] = None) -> Tuple[int, list]:
>          if self.indexes is None:
> @@ -793,29 +793,33 @@ def make_fake_am_range(self, gitdir):
>                      logger.critical('ERROR: some patches do not have indexes')
>                      logger.critical('       unable to create a fake-am range')
>                      return None, None
> -                for fn, fi in lmsg.blob_indexes:
> -                    if fn in seenfiles:
> +                for ofn, ofi, nfn in lmsg.blob_indexes:
> +                    if ofn in seenfiles:
>                          # We already processed this file, so this blob won't match
>                          continue
> -                    seenfiles.add(fn)
> -                    if set(fi) == {'0'}:
> +                    seenfiles.add(ofn)
> +                    if set(ofi) == {'0'}:
>                          # New file creation, nothing to do here
> -                        logger.debug('  New file: %s', fn)
> +                        logger.debug('  New file: %s', ofn)
>                          continue
> +                    if not ofn == nfn:
> +                        # renamed file, make sure to not add the new name later on
> +                        logger.debug('  Renamed file: %s -> %s', ofn, nfn)
> +                        seenfiles.add(nfn)
>                      # Try to grab full ref_id of this hash
> -                    ecode, out = git_run_command(gitdir, ['rev-parse', fi])
> +                    ecode, out = git_run_command(gitdir, ['rev-parse', ofi])
>                      if ecode > 0:
> -                        logger.critical('  ERROR: Could not find matching blob for %s (%s)', fn, fi)
> +                        logger.critical('  ERROR: Could not find matching blob for %s (%s)', ofn, ofi)
>                          logger.critical('         If you know on which tree this patchset is based,')
>                          logger.critical('         add it as a remote and perform "git remote update"')
>                          logger.critical('         in order to fetch the missing objects.')
>                          return None, None
> -                    logger.debug('  Found matching blob for: %s', fn)
> +                    logger.debug('  Found matching blob for: %s', ofn)
>                      fullref = out.strip()
> -                    gitargs = ['update-index', '--add', '--cacheinfo', f'0644,{fullref},{fn}']
> +                    gitargs = ['update-index', '--add', '--cacheinfo', f'0644,{fullref},{ofn}']
>                      ecode, out = git_run_command(None, gitargs)
>                      if ecode > 0:
> -                        logger.critical('  ERROR: Could not run update-index for %s (%s)', fn, fullref)
> +                        logger.critical('  ERROR: Could not run update-index for %s (%s)', ofn, fullref)
>                          return None, None
>                  mbx.add(lmsg.msg.as_string(policy=emlpolicy).encode('utf-8'))
>  
> @@ -1517,12 +1521,13 @@ def get_indexes(diff: str) -> Set[tuple]:
>              if line.find('diff ') != 0 and line.find('index ') != 0:
>                  continue
>              matches = re.search(r'^diff\s+--git\s+\w/(.*)\s+\w/(.*)$', line)
> -            if matches and matches.groups()[0] == matches.groups()[1]:
> -                curfile = matches.groups()[0]
> +            if matches:
> +                oldfile = matches.groups()[0]
> +                newfile = matches.groups()[1]
>                  continue
>              matches = re.search(r'^index\s+([\da-f]+)\.\.[\da-f]+.*$', line)
> -            if matches and curfile is not None:
> -                indexes.add((curfile, matches.groups()[0]))
> +            if matches and oldfile is not None:
> +                indexes.add((oldfile, matches.groups()[0], newfile))
>          return indexes
>  
>      @staticmethod
> 
> base-commit: cb6764b0d024316eb4a8b3bec5988d842af3b76d
> 

Ping ?
Konstantin Ryabitsev Nov. 29, 2022, 6:42 p.m. UTC | #2
On Mon, 07 Nov 2022 08:40:54 -0500, Philippe Blain wrote:
> When invoking 'b4 diff', LoreSeries::make_fake_am_range is used to
> create the old and new versions of a series in a temporary worktree.
> 
> The code is currently unprepared for a series that renames a file
> and then modifies it in a subsequent commit.
> 
> The first issue is that LoreMessage::get_indexes fails to set
> blob_indexes for a renamed file since the 'diff --git' header for such a
> change has different file names for the "old" and "new" file, but that
> function skips any diff where "old" and "new" do not match. Fix that by
> removing that condition.
> 
> [...]

Applied, thanks!

[1/1] Fix 'LoreSeries::make_fake_am_range' with renamed, then modified file
      commit: 10cabf05b0404f7a87522180750d003985211814

Best regards,
diff mbox series

Patch

diff --git a/b4/__init__.py b/b4/__init__.py
index 4456414..6c66893 100644
--- a/b4/__init__.py
+++ b/b4/__init__.py
@@ -640,16 +640,16 @@  def populate_indexes(self):
         for lmsg in self.patches[1:]:
             if lmsg is None or lmsg.blob_indexes is None:
                 continue
-            for fn, bh in lmsg.blob_indexes:
-                if fn in seenfiles:
+            for ofn, obh, nfn in lmsg.blob_indexes:
+                if ofn in seenfiles:
                     # if we have seen this file once already, then it's a repeat patch
                     # it's no longer going to match current hash
                     continue
-                seenfiles.add(fn)
-                if set(bh) == {'0'}:
+                seenfiles.add(ofn)
+                if set(obh) == {'0'}:
                     # New file, will for sure apply clean
                     continue
-                self.indexes.append((fn, bh))
+                self.indexes.append((ofn, obh))
 
     def check_applies_clean(self, gitdir: str, at: Optional[str] = None) -> Tuple[int, list]:
         if self.indexes is None:
@@ -793,29 +793,33 @@  def make_fake_am_range(self, gitdir):
                     logger.critical('ERROR: some patches do not have indexes')
                     logger.critical('       unable to create a fake-am range')
                     return None, None
-                for fn, fi in lmsg.blob_indexes:
-                    if fn in seenfiles:
+                for ofn, ofi, nfn in lmsg.blob_indexes:
+                    if ofn in seenfiles:
                         # We already processed this file, so this blob won't match
                         continue
-                    seenfiles.add(fn)
-                    if set(fi) == {'0'}:
+                    seenfiles.add(ofn)
+                    if set(ofi) == {'0'}:
                         # New file creation, nothing to do here
-                        logger.debug('  New file: %s', fn)
+                        logger.debug('  New file: %s', ofn)
                         continue
+                    if not ofn == nfn:
+                        # renamed file, make sure to not add the new name later on
+                        logger.debug('  Renamed file: %s -> %s', ofn, nfn)
+                        seenfiles.add(nfn)
                     # Try to grab full ref_id of this hash
-                    ecode, out = git_run_command(gitdir, ['rev-parse', fi])
+                    ecode, out = git_run_command(gitdir, ['rev-parse', ofi])
                     if ecode > 0:
-                        logger.critical('  ERROR: Could not find matching blob for %s (%s)', fn, fi)
+                        logger.critical('  ERROR: Could not find matching blob for %s (%s)', ofn, ofi)
                         logger.critical('         If you know on which tree this patchset is based,')
                         logger.critical('         add it as a remote and perform "git remote update"')
                         logger.critical('         in order to fetch the missing objects.')
                         return None, None
-                    logger.debug('  Found matching blob for: %s', fn)
+                    logger.debug('  Found matching blob for: %s', ofn)
                     fullref = out.strip()
-                    gitargs = ['update-index', '--add', '--cacheinfo', f'0644,{fullref},{fn}']
+                    gitargs = ['update-index', '--add', '--cacheinfo', f'0644,{fullref},{ofn}']
                     ecode, out = git_run_command(None, gitargs)
                     if ecode > 0:
-                        logger.critical('  ERROR: Could not run update-index for %s (%s)', fn, fullref)
+                        logger.critical('  ERROR: Could not run update-index for %s (%s)', ofn, fullref)
                         return None, None
                 mbx.add(lmsg.msg.as_string(policy=emlpolicy).encode('utf-8'))
 
@@ -1517,12 +1521,13 @@  def get_indexes(diff: str) -> Set[tuple]:
             if line.find('diff ') != 0 and line.find('index ') != 0:
                 continue
             matches = re.search(r'^diff\s+--git\s+\w/(.*)\s+\w/(.*)$', line)
-            if matches and matches.groups()[0] == matches.groups()[1]:
-                curfile = matches.groups()[0]
+            if matches:
+                oldfile = matches.groups()[0]
+                newfile = matches.groups()[1]
                 continue
             matches = re.search(r'^index\s+([\da-f]+)\.\.[\da-f]+.*$', line)
-            if matches and curfile is not None:
-                indexes.add((curfile, matches.groups()[0]))
+            if matches and oldfile is not None:
+                indexes.add((oldfile, matches.groups()[0], newfile))
         return indexes
 
     @staticmethod