'How can I write to a file with the same formatting as print?

TL;DR

While trying to write a string to a file the following error occurred:

Code

logfile.write(cli_args.last_name)

Output

UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-9: ordinal not in range(128)

But this works:

Code

print(cli_args.last_name)

Output

Pérez

Why?

FULL CONTEXT

I made a script which receives data from a Linux CLI, processes it and finally creates a Zendesk ticket with the provided data. It is kind of a CLI API, since before my script there is a bigger system which has a web interface with forms, where users fill the values of the fields and are then replaced into the CLI script. For example:

myscript.py --first_name '_first_name_' --last_name '_last_name_'

The script was working with no issues, until yesterday when the web was updated. I think they changed something related to charsets or encoding.

I do some simple logging with F-strings by opening a file and writing some informative messages in case anything fails, so I can go back to check where it happened. Also the CLI attributes are read using the argparse module. Example:

logfile.write(f"\tChecking for opened tickets for user '{cli_args.first_name} {cli_args.last_name}'\n")

After the website update I am getting an error like this:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 8-9: ordinal not in range(128)

Doing some troubleshooting I found it is because some users input names with accent marks like Carlos Pérez.

I need the script to work again and also prepare it for inputs like that, so I looked for answers by checking the HTTP headers in the input forms of the web console and found out it uses a Content-Type: text/html; charset=UTF-8; my first try was to encode the str passed in the CLI argument to utf-8 and decode it again using the same codec, but didn't succeed.

On my second try, I checked the Python docs str.encode() and bytes.decode(). So I tried this:

logfile.write(
    "\tChecking for opened tickets for user "
    f"'{cli_args.first_name.encode(encoding='utf-8', errors='ignore').decode('utf-8')} "
    f"{cli_args.last_name.encode(encoding='utf-8', errors='ignore').decode('utf-8')}'"
)

It worked but removed the accent marked letter so Carlos Pérez became Carlos Prez which is of no use to me in this case, I need the full input.

As a desperate move I tried printing the same F-string I was trying to write to the logfile, which to my surprise it worked. It printed to the console Carlos Pérez without any kind of encoding/decoding process.

How does print work? and Why trying to write to the file didn't work? But most importantly How can I write to a file with the same formatting as print?

Edit 1 @MarkTolonen

Tried the following:

logfile = open("/usr/share/pandora_server/util/plugin/plugin_mcm/sandbox/755bug.txt", mode="a", encoding="utf8")
logfile.write(cli_args.body)
logfile.close()

Output:

Traceback (most recent call last): File "/usr/share/pandora_server/util/plugin/plugin_mcm/sandbox/ticket_query_app.py", line 414, in main() File "/usr/share/pandora_server/util/plugin/plugin_mcm/sandbox/ticket_query_app.py", line 81, in main logfile.write(cli_args.body) UnicodeEncodeError: 'utf-8' codec can't encode characters in position 8-9: surrogates not allowed

Edit 2

I managed to get the text that is causing the issue:

if __name__ == "__main__":
    string = (
        "Buenos d\udcc3\udcadas,\r\n\r\n"
        "Mediante  monitoreo autom\udcc3\udca1tico se ha detectado un evento fuera de lo normal:\r\n\r\n"
        "Descripci\udcc3\udcb3n del evento: _snmp_f13_\r\n"
        "Causas sugeridas del evento: _snmp_f14_\r\n"
        "Posible afectaci\udcc3\udcb3n del evento: _snmp_f15_\r\n"
        "Validaciones de bajo impacto: _snmp_f16_\r\n"
        "Fecha y hora del evento: 2021-07-14 17:47:51\r\n\r\n"
        "Saludos."
    )

    # Output: Text with the unicodes translated
    print(string)

    # Output: "UnicodeEncodeError: 'utf-8' codec can't encode characters in position 8-9: surrogates not allowed"
    with open(file="test.log", mode="w", encoding="utf8") as logfile:
        logfile.write(string)


Solution 1:[1]

The answer is the encoding parameter to open. Observe:

Last login: Wed Jul 14 15:05:24 2021 from 50.126.68.34
[timrprobocom@jared-ingersoll ~]$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('x.txt','a')
>>> g = open('y.txt','a',encoding='utf-8')
>>> s = "spades \u2660 spades"
>>> f.write(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u2660' in position 7: ordinal not in range(128)
>>> g.write(s)
15
>>>
[timrprobocom@jared-ingersoll ~]$ hexdump -C y.txt
00000000  73 70 61 64 65 73 20 e2  99 a0 20 73 70 61 64 65  |spades ... spade|
*
00000011

Solution 2:[2]

While not a direct answer to your question, I feel like this logic can be handled with a single UPDATE statement in mysql directly. Something like:

WITH updateCTE AS 
(
   SELECT DailyTaskID, 
      CASE WHEN LEAD(DailyTaskID) OVER (ORDER BY starttime DESC) IS NOT NULL THEN 0 ELSE 1 END as lastRecordInd,
      CASE WHEN LEAD(DailyTaskID) OVER (ORDER BY starttime DESC) IS NOT NULL THEN LAG(starttime) OVER (PARTITION BY starttime DESC) ELSE starttime END as lasttime
)
UPDATE tbldailytasks
SET 
   duration = CASE WHEN lastRecordInd = 0 THEN lasttime - starttime ELSE 0 END,
   endtime = CASE WHEN lastRecordID = 0 THEN lasttime ELSE starttime END
where taskdate =<indate>;

I may be a bit off and some of this logic needs somewhat newer versions of mysql, but wanted to offer up a way out of Access and row-by-row updates.

Solution 3:[3]

Resolved.

There were a couple of issues.

  1. The MySQL 8.0 database version had many changes that my older version did not accommodate - one being that my autonumbers were not autonumbering. So I rebuilt my tables to remove errors and deprecated attributes.
  2. the MySQL ODBC connector has configuration options that I had forgotten needed to be set. Primarily - a) Enable dynamic cursors and b) Return Matched rows instead of affected rows. (This is probably the most important option)

BTW - I tried a version of the the sql statement provided above and it did work nicely.. This is my version:

with cteDailytasks as (
select dailytaskautono, dailytaskid, starttime, endtime, duration
    , lag(starttime) over w as 'nextstart'
    , lead(starttime) over w as 'prevstart' 
 from tbldailytasks 
where taskdate = '2022-01-17'
window w as (order by starttime desc))
update tbldailytasks a,
(select dailytaskautono, nextstart, prevstart 
from cteDailyTasks) as starttimes
set a.endtime = case when starttimes.nextstart is null then a.starttime else starttimes.nextstart end, 
    a.duration = case when starttimes.nextstart is null then cast(current_date() as datetime) else cast(timediff(starttimes.nextstart,a.starttime) as datetime) end 
where a.dailytaskautono = starttimes.dailytaskautono;

Thanks for the replies folks.

Paul

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Roberts
Solution 2 JNevill
Solution 3 Paul T.