mbox series

[0/2] git-p4: encoding of data from perforce

Message ID 20210412085251.51475-1-andrew@adoakley.name (mailing list archive)
Headers show
Series git-p4: encoding of data from perforce | expand

Message

Andrew Oakley April 12, 2021, 8:52 a.m. UTC
When using python3, git-p4 fails to handle data from perforce which is
not valid UTF-8.  In large repositories it's very likely that such data
will exist - perforce itself does no validation of the data by default.

Historically git-p4 has just passed whatever bytes it got from perforce
into git.  This seems like a sensible approach - git-p4 has no idea what
encoding may have been used and it seems likely that different encodings
are used within a repository.

I was trying to do a more thorough job, moving more of git-p4 over to
using bytes.  Unfortunately the changes end up being large and hard to
review.  In most cases it's probably sufficient to just avoid decoding
the commit messages.

There have been a couple of previous proposals around trying to decode
this data using a user-configured encoding:
http://public-inbox.org/git/CAE5ih7-F9efsiV5AQmw3ocjiy+BT6ZAT5fA0Lx0OSkVTO8Kqjg@mail.gmail.com/T/
http://public-inbox.org/git/20210409153815.7joohvmlnh6itczc@tb-raspi4/T/