Be explicit about byte and encoding in command module
This module reads output from a command (via a pipe) one line at
a time. The only input we should receive from it is either:
* a byte string from the command output terminating with a \n
* a python "string" terminating with a \n
* that means in python2, a bytestring with a \n
* and in python3, a unicode string with a \n
For now, we only need to focus on python2 because we explicitly run
this code under python2, however, it's wise to be forward-compatible
with python3.
The error in the previous version of this code is to assume that the
value we read from the command was a unicode string which needed to
be encoded in order to be written to the log file. That's incorrect;
what we receive from the command should already be encoded according
to the system locale.
This change no longer encodes the lines received from the command
(because they are always bytestrings, in python2 or python3, they
will follow the code path where no further encoding happens). Of course,
if it turns out not to be encoded in utf-8, then zuul_stream is likely
going to bomb because it assumes everything it reads is utf-8, but
that's a different problem. In practice, we have utf-8 or C locales
universally at the moment.
Finally, there is a bunch of explicit encoding and bytestring handling
added to this method. That is mostly in service of the future codepath
under python3; elsewhere in this file we call ".addLine('[Zuul] ...')".
Under python2, that's a bytestring so no further work is necessary. In
python3, that's a unicode string, so we need to encode it.
We should never hit the exception handler, however, if somehow we manage
to, it should at least be able to write some data to the log file which
approximates what it was given.
Change-Id: Iae2f3ee012d914454c335184a8ec7c7ecb924ec7
diff --git a/zuul/ansible/library/command.py b/zuul/ansible/library/command.py
index f701b48..0fc6129 100644
--- a/zuul/ansible/library/command.py
+++ b/zuul/ansible/library/command.py
@@ -159,9 +159,14 @@
# Jenkins format but with microsecond resolution instead of
# millisecond. It is kept so log parsing/formatting remains
# consistent.
- ts = datetime.datetime.now()
- outln = '%s | %s' % (ts, ln)
- self.logfile.write(outln.encode('utf-8'))
+ ts = str(datetime.datetime.now()).encode('utf-8')
+ if not isinstance(ln, bytes):
+ try:
+ ln = ln.encode('utf-8')
+ except Exception:
+ ln = repr(ln).encode('utf-8') + b'\n'
+ outln = b'%s | %s' % (ts, ln)
+ self.logfile.write(outln)
def follow(fd, log_uuid):