python输出重定向到文件时候的UnicodeEncodeError问题

python 2.7, 重定向输出到文件时候出现错误：

UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-48: ordinal not in range(128)

解决方法：

% PYTHONIOENCODING=UTF-8 python xxx.py

参考：http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file

python 2.7打开文件带编码参数

import io
with io.open(fname, "rt", encoding="utf-8") as f:

http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2

另外：解码unicode

>>> s="\u5730\u56fe\u89c6\u56fe"
>>> print s
\u5730\u56fe\u89c6\u56fe
>>> print s.decode("unicode-escape")
地图视图

另外：代码中设定编码

# coding: UTF-8

reload(sys)
sys.setdefaultencoding("utf-8")

另另外：解码html code

Python 2.6-3.3
You can also use the HTML parser from the standard lib

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print h.unescape('&pound;682m')
£682m
see http://docs.python.org/2/library/htmlparser.html

Python 3.4+
HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon. Instead, use html.unescape():

import html
print(html.unescape('&pound;682m'))
see https://docs.python.org/3/library/html.html#html.unescape

Linux获取文件的编码：

file --mime known_issues.html~
known_issues.html~: text/html; charset=iso-8859-1

Linux转换文件编码：

iconv -f iso-8859-1 -t utf-8 known_issues.html~ > known_issues.html.utf-8

参考：
http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string
http://www.zhihu.com/collection/58495075

补充：另外一个挺好的文章。其中提到如何检测一个字符串的编码chardet.detect(string)['encoding'] 和如何处理字节和字串。
http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes

update: 16-11-28
另外一篇好文章： https://segmentfault.com/a/1190000007594453