[v4,08/11] git-p4: p4CmdList - support Unicode encoding
diff mbox series

Message ID e1a424a955071414a634a703a85f1969f968bb0f.1575498578.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • git-p4.py: Cast byte strings to unicode strings in python3
Related show

Commit Message

Ben Keene via GitGitGadget Dec. 4, 2019, 10:29 p.m. UTC
From: Ben Keene <seraphire@gmail.com>

The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

Change the code so that the key will always be encoded AS_STRING()

Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

Additionally, change literal text prior to conversion to be literal bytes.

Signed-off-by: Ben Keene <seraphire@gmail.com>
(cherry picked from commit 88306ac269186cbd0f6dc6cfd366b50b28ee4886)
---
 git-p4.py | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Dec. 5, 2019, 1:55 p.m. UTC | #1
"Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Ben Keene <seraphire@gmail.com>
>
> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.

Somewhere in the midway of the series, the log message starts using
all-caps AS_STRING and AS_BYTES to describe some specific things,
and it would help readers if the first one of these steps explain
what they mean (I am guessing AS_STRING is an unicode object in both
Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
or something like that?).

> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.

s/isto/is to/;

This sentence is a bit hard to read.

This change does not make the function optionally convert the input
we feed to the p4 command---it only changes the values in the
command output.  But the readers cannot tell that easily until
reading to the very end of the sentence, i.e. "returned by the
function", as written.

We probably want to be a bit more explicit to say what gets
converted; perhaps renaming the parameter to encode_cmd_output may
help.

> Change the code so that the key will always be encoded AS_STRING()

s/key/key of the returned hash/ or something to clarify what key you
are talking about.

> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.

"Data that is passed to the standard input stream of the p4 process"
to clarify whose standard input you are talking about (iow, "git p4"
also has and it may use its standard input, but this function does
not muck with it).
Ben Keene Dec. 5, 2019, 8:23 p.m. UTC | #2
On 12/5/2019 8:55 AM, Junio C Hamano wrote:
> "Ben Keene via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ben Keene <seraphire@gmail.com>
>>
>> The p4CmdList is a commonly used function in the git-p4 code. It is used to execute a command in P4 and return the results of the call in a list.
> Somewhere in the midway of the series, the log message starts using
> all-caps AS_STRING and AS_BYTES to describe some specific things,
> and it would help readers if the first one of these steps explain
> what they mean (I am guessing AS_STRING is an unicode object in both
> Python 2 and 3, and AS_BYTES is a plain vanilla string in Python 2,
> or something like that?).

I rewrote almost the entire commit message. Hopefully this will clarify 
the code.

>> Change this code to take a new optional parameter, encode_data that will optionally convert the data AS_STRING() that isto be returned by the function.
> s/isto/is to/;
>
> This sentence is a bit hard to read.
>
> This change does not make the function optionally convert the input
> we feed to the p4 command---it only changes the values in the
> command output.  But the readers cannot tell that easily until
> reading to the very end of the sentence, i.e. "returned by the
> function", as written.
>
> We probably want to be a bit more explicit to say what gets
> converted; perhaps renaming the parameter to encode_cmd_output may
> help.


I renamed the parameter as suggested.


>> Change the code so that the key will always be encoded AS_STRING()
> s/key/key of the returned hash/ or something to clarify what key you
> are talking about.
>
>> Data that is passed for standard input (stdin) should be AS_BYTES() to ensure unicode text that is supplied will be written out as bytes.
> "Data that is passed to the standard input stream of the p4 process"
> to clarify whose standard input you are talking about (iow, "git p4"
> also has and it may use its standard input, but this function does
> not muck with it).
>

Patch
diff mbox series

diff --git a/git-p4.py b/git-p4.py
index 0da640be93..f7c0ef0c53 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -711,7 +711,23 @@  def isModeExecChanged(src_mode, dst_mode):
     return isModeExec(src_mode) != isModeExec(dst_mode)
 
 def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
-        errors_as_exceptions=False):
+        errors_as_exceptions=False, encode_data=True):
+    """ Executes a P4 command:  'cmd' optionally passing 'stdin' to the command's
+        standard input via a temporary file with 'stdin_mode' mode.
+
+        Output from the command is optionally passed to the callback function 'cb'.
+        If 'cb' is None, the response from the command is parsed into a list
+        of resulting dictionaries. (For each block read from the process pipe.)
+
+        If 'skip_info' is true, information in a block read that has a code type of
+        'info' will be skipped.
+
+        If 'errors_as_exceptions' is set to true (the default is false) the error
+        code returned from the execution will generate an exception.
+
+        If 'encode_data' is set to true (the default) the data that is returned 
+        by this function will be passed through the "as_string" function.
+    """
 
     if not isinstance(cmd, list):
         cmd = "-G " + cmd
@@ -734,7 +750,7 @@  def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
             stdin_file.write(stdin)
         else:
             for i in stdin:
-                stdin_file.write(i + '\n')
+                stdin_file.write(as_bytes(i) + b'\n')
         stdin_file.flush()
         stdin_file.seek(0)
 
@@ -748,12 +764,15 @@  def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
         while True:
             entry = marshal.load(p4.stdout)
             if skip_info:
-                if 'code' in entry and entry['code'] == 'info':
+                if b'code' in entry and entry[b'code'] == b'info':
                     continue
             if cb is not None:
                 cb(entry)
             else:
-                result.append(entry)
+                out = {}
+                for key, value in entry.items():
+                    out[as_string(key)] = (as_string(value) if encode_data else value)
+                result.append(out)
     except EOFError:
         pass
     exitCode = p4.wait()