本文翻译自:Writing a pandas DataFrame to CSV file
I have a dataframe in pandas which I would like to write to a CSV file. 我有一个熊猫数据框,我想将其写入CSV文件。 I am doing this using: 我正在使用以下方法:
df.to_csv('out.csv')
And getting the error: 并得到错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)
Is there any way to get around this easily (ie I have unicode characters in my data frame)? 有什么方法可以轻松解决此问题(即我的数据框中有Unicode字符)吗? And is there a way to write to a tab delimited file instead of a CSV using eg a 'to-tab' method (that I dont think exists)? 是否有一种方法可以使用例如“ to-tab”方法(我认为不存在)写入制表符分隔文件而不是CSV?
参考:https://stackoom.com/question/190W9/将pandas-DataFrame写入CSV文件
To delimit by a tab you can use the sep
argument of to_csv
: 要用制表符分隔,可以使用to_csv
的sep
参数:
df.to_csv(file_name, sep='\t')
To use a specific encoding (eg 'utf-8') use the encoding
argument: 要使用特定的编码(例如'utf-8'),请使用encoding
参数:
df.to_csv(file_name, sep='\t', encoding='utf-8')
Sometimes you face these problems if you specify UTF-8 encoding also. 如果同时指定UTF-8编码,有时会遇到这些问题。 I recommend you to specify encoding while reading file and same encoding while writing to file. 我建议您在读取文件时指定编码,而在写入文件时指定相同的编码。 This might solve your problem. 这可能会解决您的问题。
Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following. 如果您遇到编码为'utf-8'的问题,并且想要逐个单元地进行操作,则可以尝试其他方法。
Python 2 Python 2
(Where "df" is your DataFrame object.) (其中“ df”是您的DataFrame对象。)
for column in df.columns:
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
df.set_value(idx,column,x)
except Exception:
print 'encoding error: {0} {1}'.format(idx,column)
df.set_value(idx,column,'')
continue
Then try: 然后尝试:
df.to_csv(file_name)
You can check the encoding of the columns by: 您可以通过以下方式检查列的编码:
for column in df.columns:
print '{0} {1}'.format(str(type(df[column][0])),str(column))
Warning: errors='ignore' will just omit the character eg 警告:errors ='ignore'只会忽略字符,例如
IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'
Python 3 Python 3
for column in df.columns:
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
df.set_value(idx,column,x)
except Exception:
print('encoding error: {0} {1}'.format(idx,column))
df.set_value(idx,column,'')
continue
When you are storing a DataFrame
object into a csv file using the to_csv
method, you probably wont be needing to store the preceding indices of each row of the DataFrame
object. 当使用to_csv
方法将DataFrame
对象存储到csv文件中时 ,可能不需要存储DataFrame
对象每一行的先前索引 。
You can avoid that by passing a False
boolean value to index
parameter. 您可以通过将False
布尔值传递给index
参数来避免这种情况。
Somewhat like: 有点像:
df.to_csv(file_name, encoding='utf-8', index=False)
So if your DataFrame object is something like: 因此,如果您的DataFrame对象类似于:
Color Number
0 red 22
1 blue 10
The csv file will store: csv文件将存储:
Color,Number
red,22
blue,10
instead of (the case when the default value True
was passed) 而不是(通过默认值 True
的情况 )
,Color,Number
0,red,22
1,blue,10
it could be not the answer for this case, but as I had the same error-message with .to_csv I tried .toCSV('name.csv') and the error-message was different ("'SparseDataFrame' object has no attribute 'toCSV'"). 它可能不是这种情况的答案,但是由于我对.to_csv使用了相同的错误消息, 因此我尝试使用.toCSV('name.csv')并且错误消息有所不同(“'SparseDataFrame'对象没有属性' toCSV'“)。 So the problem was solved by turning dataframe to dense dataframe 因此,通过将数据帧转换为密集数据帧解决了该问题
df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')