diff mbox series

[patatt] Handle MIME encoded-word & other header manglings

Message ID 20210530162625.31243-1-paul@pbarker.dev (mailing list archive)
State New, archived
Headers show
Series [patatt] Handle MIME encoded-word & other header manglings | expand

Commit Message

Paul Barker May 30, 2021, 4:26 p.m. UTC
When testing patatt with patches sent to a sr.ht hosted mailing list, it
was found that long header lines (such as the X-Developer-Signature
line) were re-encoded using the MIME encoded-word syntax (RFC 2047) when
an mbox archive is generated, causing patatt to choke on the resulting
text which looks like this:

    X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject;
     bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; =?utf-8?q?b=3DowGbwMvMwCF2?=
     =?utf-8?q?w7xIXuiX9CvG02pJDAmb67lTNi0+IeF97TL76vtKD7xjSjaluz0o/KfmZLX8rMi7_?=
     =?utf-8?q?l3M6O0pZGMQ4GGTFFFl2z951+fqDJVt7b0gHw8xhZQIZwsDFKQATydFhZJi+fFfvJ?=
     =?utf-8?q?8+0MF7GrfzWnP?=
     K7mAM/3n/r/UC+bprf6/g114QYGdbHcsaK7b1nanfA4IeZi1V0lL26cruXUWxgSEnNDP1FrAA=

Avoiding this issue by neatly wrapping the X-Developer-Signature header
before sending doesn't appear to be possible without making invasive
changes to git-send-email and/or the Net::SMTP perl module. The header
content generated by patatt is wrapped at 78 characters as can be seen
here from a locally signed patch file:

    X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject;
    bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=;
    b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmbN1xO2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv
    Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBMpG0Dw/9Kpzgpc8UsQwOPK/taW6
    dFnZyy5QlXPfNCC4WTc76ft9ZnZJjI37a17fP7sxvclKJ1tm36EhITcK62Pphje9KrmOxMJg4A

Running `git send-email --smtp-debug=1 0001.patch` shows that this is
joined into a single long line before the message is sent:

    Net::SMTP::_SSL=GLOB(0x5646fbdc3ac8)>>> X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmb571P2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBM5JA3I8O5hP6Tqm7lJst0rldcux 1V7M4q8T5o1fPU6Zs+hxj+SjvN8D/DK3rn8b0m34/Xy388Yeu8jvFdJf/c6Y6LDU7Hulj01nAAAA==

So we need to accept that the X-Developer-Signature line may be quite
long and so may be re-encoded by a mail server or archiver.

The Python email.header module provides the decode_header() and
make_header() functions which can be used to handle MIME encoded-word
syntax or other header manglings which may occur. The decode_header()
function requires a str argument so we must decode our bytes before
using this function. Thankfully, RFC 2822 makes life easy here as it
says that all header content must be composed of US-ASCII characters
(see section 2.2 of the RFC) so decoding is straightforward. The header
content is re-encoded into bytes after un-mangling to avoid having to
modify every other location in patatt where the header content is
accessed.

Signed-off-by: Paul Barker <paul@pbarker.dev>
---
 patatt/__init__.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Konstantin Ryabitsev May 31, 2021, 1:04 p.m. UTC | #1
On Sun, May 30, 2021 at 05:26:25PM +0100, Paul Barker wrote:
> When testing patatt with patches sent to a sr.ht hosted mailing list, it
> was found that long header lines (such as the X-Developer-Signature
> line) were re-encoded using the MIME encoded-word syntax (RFC 2047) when
> an mbox archive is generated, causing patatt to choke on the resulting
> text which looks like this:
> 
>     X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject;
>      bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; =?utf-8?q?b=3DowGbwMvMwCF2?=
>      =?utf-8?q?w7xIXuiX9CvG02pJDAmb67lTNi0+IeF97TL76vtKD7xjSjaluz0o/KfmZLX8rMi7_?=
>      =?utf-8?q?l3M6O0pZGMQ4GGTFFFl2z951+fqDJVt7b0gHw8xhZQIZwsDFKQATydFhZJi+fFfvJ?=
>      =?utf-8?q?8+0MF7GrfzWnP?=
>      K7mAM/3n/r/UC+bprf6/g114QYGdbHcsaK7b1nanfA4IeZi1V0lL26cruXUWxgSEnNDP1FrAA=

Bah. I understand that we must handle this situation, but this is an annoying
and unncessary hack when WSP characters are already conveniently present in
the header.

> Avoiding this issue by neatly wrapping the X-Developer-Signature header
> before sending doesn't appear to be possible without making invasive
> changes to git-send-email and/or the Net::SMTP perl module. The header
> content generated by patatt is wrapped at 78 characters as can be seen
> here from a locally signed patch file:
> 
>     X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject;
>     bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=;
>     b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmbN1xO2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv
>     Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBMpG0Dw/9Kpzgpc8UsQwOPK/taW6
>     dFnZyy5QlXPfNCC4WTc76ft9ZnZJjI37a17fP7sxvclKJ1tm36EhITcK62Pphje9KrmOxMJg4A
> 
> Running `git send-email --smtp-debug=1 0001.patch` shows that this is
> joined into a single long line before the message is sent:
> 
>     Net::SMTP::_SSL=GLOB(0x5646fbdc3ac8)>>> X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmb571P2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBM5JA3I8O5hP6Tqm7lJst0rldcux 1V7M4q8T5o1fPU6Zs+hxj+SjvN8D/DK3rn8b0m34/Xy388Yeu8jvFdJf/c6Y6LDU7Hulj01nAAAA==

This, too, bugs me to no end. :) I've actually considered making --hook not
perform any wrapping, but I'm hoping that git-send-email can be fixed to not
do this. To be compliant with (admittedly old and obsoleted) email RFCs,
headers must be no longer than 78 characters, so I'm pretty sure what
git-send-email does is an unintentional side-effect.

> So we need to accept that the X-Developer-Signature line may be quite
> long and so may be re-encoded by a mail server or archiver.

Agreed, thanks for finding this.

>      @staticmethod
>      def _dkim_canonicalize_header(hval: bytes) -> bytes:

I think it would make sense to wrap this around in:

if hval.find(b'?q?') >= 0:

This will save unnecessary decoding/encoding/header parsing. Do you agree?

> +        # The decode_header() function we're about to call requires a str
> +        # argument. Since RFC2822 (sec 2.2), header fields must be ASCII
> +        # characters so this is easy to achieve.
> +        hval = hval.decode('ascii')

I think we'll want to pass errors='ignore' here just to avoid a backtrace for
obviously wrong header values.

> +        # Handle MIME encoded-word syntax or other types of header encoding
> +        hval = str(email.header.make_header(email.header.decode_header(hval)))
> +        # Convert the header back into bytes for further processing
> +        hval = hval.encode('utf-8')
>          # We only do relaxed for headers
>          #    o  Unfold all header field continuation lines as described in
>          #       [RFC5322]; in particular, lines with terminators embedded in

Thanks!

-K
Paul Barker May 31, 2021, 1:46 p.m. UTC | #2
On Mon, 31 May 2021 09:04:59 -0400
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:

> On Sun, May 30, 2021 at 05:26:25PM +0100, Paul Barker wrote:
> > When testing patatt with patches sent to a sr.ht hosted mailing
> > list, it was found that long header lines (such as the
> > X-Developer-Signature line) were re-encoded using the MIME
> > encoded-word syntax (RFC 2047) when an mbox archive is generated,
> > causing patatt to choke on the resulting text which looks like this:
> > 
> >     X-Developer-Signature: v=1; a=openpgp-sha256; l=672;
> > h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=;
> > =?utf-8?q?b=3DowGbwMvMwCF2?=
> > =?utf-8?q?w7xIXuiX9CvG02pJDAmb67lTNi0+IeF97TL76vtKD7xjSjaluz0o/KfmZLX8rMi7_?=
> > =?utf-8?q?l3M6O0pZGMQ4GGTFFFl2z951+fqDJVt7b0gHw8xhZQIZwsDFKQATydFhZJi+fFfvJ?=
> > =?utf-8?q?8+0MF7GrfzWnP?=
> > K7mAM/3n/r/UC+bprf6/g114QYGdbHcsaK7b1nanfA4IeZi1V0lL26cruXUWxgSEnNDP1FrAA=
> >  
> 
> Bah. I understand that we must handle this situation, but this is an
> annoying and unncessary hack when WSP characters are already
> conveniently present in the header.
> 
> > Avoiding this issue by neatly wrapping the X-Developer-Signature
> > header before sending doesn't appear to be possible without making
> > invasive changes to git-send-email and/or the Net::SMTP perl
> > module. The header content generated by patatt is wrapped at 78
> > characters as can be seen here from a locally signed patch file:
> > 
> >     X-Developer-Signature: v=1; a=openpgp-sha256; l=672;
> > h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=;
> >     b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmbN1xO2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv
> >     Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBMpG0Dw/9Kpzgpc8UsQwOPK/taW6
> >     dFnZyy5QlXPfNCC4WTc76ft9ZnZJjI37a17fP7sxvclKJ1tm36EhITcK62Pphje9KrmOxMJg4A
> > 
> > Running `git send-email --smtp-debug=1 0001.patch` shows that this
> > is joined into a single long line before the message is sent:
> >   
> >     Net::SMTP::_SSL=GLOB(0x5646fbdc3ac8)>>> X-Developer-Signature:
> > v=1; a=openpgp-sha256; l=672; h=from:subject;
> > bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=;
> > b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmb571P2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv
> > Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBM5JA3I8O5hP6Tqm7lJst0rldcux
> > 1V7M4q8T5o1fPU6Zs+hxj+SjvN8D/DK3rn8b0m34/Xy388Yeu8jvFdJf/c6Y6LDU7Hulj01nAAAA==
> >  
> 
> This, too, bugs me to no end. :) I've actually considered making
> --hook not perform any wrapping, but I'm hoping that git-send-email
> can be fixed to not do this. To be compliant with (admittedly old and
> obsoleted) email RFCs, headers must be no longer than 78 characters,
> so I'm pretty sure what git-send-email does is an unintentional
> side-effect.
> 
> > So we need to accept that the X-Developer-Signature line may be
> > quite long and so may be re-encoded by a mail server or archiver.  
> 
> Agreed, thanks for finding this.
> 
> >      @staticmethod
> >      def _dkim_canonicalize_header(hval: bytes) -> bytes:  
> 
> I think it would make sense to wrap this around in:
> 
> if hval.find(b'?q?') >= 0:
> 
> This will save unnecessary decoding/encoding/header parsing. Do you
> agree?
> 

Agreed.

> > +        # The decode_header() function we're about to call
> > requires a str
> > +        # argument. Since RFC2822 (sec 2.2), header fields must be
> > ASCII
> > +        # characters so this is easy to achieve.
> > +        hval = hval.decode('ascii')  
> 
> I think we'll want to pass errors='ignore' here just to avoid a
> backtrace for obviously wrong header values.

Agreed.

I'll send a v2.

Thanks,
diff mbox series

Patch

diff --git a/patatt/__init__.py b/patatt/__init__.py
index 460d282..4e4d5c7 100644
--- a/patatt/__init__.py
+++ b/patatt/__init__.py
@@ -91,7 +91,7 @@  class DevsigHeader:
 
     def from_bytes(self, hval: bytes) -> None:
         self.hval = DevsigHeader._dkim_canonicalize_header(hval)
-        hval = re.sub(rb'\s*', b'', hval)
+        hval = re.sub(rb'\s*', b'', self.hval)
         for chunk in hval.split(b';'):
             parts = chunk.split(b'=', 1)
             if len(parts) < 2:
@@ -392,6 +392,14 @@  class DevsigHeader:
 
     @staticmethod
     def _dkim_canonicalize_header(hval: bytes) -> bytes:
+        # The decode_header() function we're about to call requires a str
+        # argument. Since RFC2822 (sec 2.2), header fields must be ASCII
+        # characters so this is easy to achieve.
+        hval = hval.decode('ascii')
+        # Handle MIME encoded-word syntax or other types of header encoding
+        hval = str(email.header.make_header(email.header.decode_header(hval)))
+        # Convert the header back into bytes for further processing
+        hval = hval.encode('utf-8')
         # We only do relaxed for headers
         #    o  Unfold all header field continuation lines as described in
         #       [RFC5322]; in particular, lines with terminators embedded in