python输出重定向到文件时候的UnicodeEncodeError问题
python 2.7, 重定向输出到文件时候出现错误:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-48: ordinal not in range(128)
解决方法:
% PYTHONIOENCODING=UTF-8 python xxx.py
参考:http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file
python 2.7打开文件带编码参数
import io
with io.open(fname, "rt", encoding="utf-8") as f:
http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2
另外:解码unicode
>>> s="\u5730\u56fe\u89c6\u56fe"
>>> print s
\u5730\u56fe\u89c6\u56fe
>>> print s.decode("unicode-escape")
地图视图
另外:代码中设定编码
# coding: UTF-8
reload(sys)
sys.setdefaultencoding("utf-8")
另另外:解码html code
Python 2.6-3.3
You can also use the HTML parser from the standard lib
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print h.unescape('£682m')
£682m
see http://docs.python.org/2/library/htmlparser.html
Python 3.4+
HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon. Instead, use html.unescape():
import html
print(html.unescape('£682m'))
see https://docs.python.org/3/library/html.html#html.unescape
Linux获取文件的编码:
file --mime known_issues.html~
known_issues.html~: text/html; charset=iso-8859-1
Linux转换文件编码:
iconv -f iso-8859-1 -t utf-8 known_issues.html~ > known_issues.html.utf-8
参考:
http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string
http://www.zhihu.com/collection/58495075
补充:
另外一个挺好的文章。其中提到如何检测一个字符串的编码chardet.detect(string)['encoding']
和如何处理字节和字串。
http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes
update: 16-11-28
另外一篇好文章: https://segmentfault.com/a/1190000007594453