Python小记--在特定位置修改CSV文件

    因为正在参加PhysioNet Challenge 2019的比赛,其中数据根据题目的要求需要预处理一下,所以用python中的CSV库对约20万条数据进行了有规则的修改。修改的规则是:找到一行末尾字符是“1.0”,并且上一行的末尾字符是“0.0”的数据,并将该行以上12行内的数据末尾字符都改为“1.0”.

    具体的实现过程是这样的:①将所有要修改的行找出来,并将该行的特征(行数,字符标记)存储到字典中;②对字典按照做相应的处理; ③新建一个文件,将源文件的每一行遍历,如果该行的特征不在字典中,则完全复制到新建文件,如果该行恰好在字典中,则将该行的最后一个字符改为“1.0”,在复制到新建文件中;④最后生成的新建文件就是我们需要的做出处理的数据文件。

代码和注释如下,亲测有效:

import csv
import numpy as np
"""
import csv and numpy.
"""
dictOrig = {}
"""
Define a dict to store the characteristics of the file need to be modified.
"""
with open('C:/Users/Ethan/PycharmProjects/readcsv/Sepsis.csv', 'r') as csvfile:
"""
Open the file need to be modified, note that the file path must be correct.
"""
    reader = csv.reader(csvfile)
    rows = [rowTemp for rowTemp in reader]
    """
    rows = [['1.0','2.0','#','!'],['1.0','2.0','#','!'],['1.0','2.0','#','!'],[...],...]
    rows is a list collection of each row in the file.
    """
    for row in rows:
        if row[-1] == '1.0' and rows[rows.index(row)-1][-1] == '0.0':
            dictOrig[rows.index(row)] = row[-2]
        """
        Travers the rows to record the characteristics of the file need to be modified in the dict.
        In this code, the feature we want to modify is that the end string of the row is '1.0' and 
        the end string of the last row is '0.0'
        """
    for temp in list(dictOrig.values()):
    """
    Make some modifications to the feature, such as the code that we want to change the end string 
    of the uo 12 rows fo the target line to '1.0'
    """
        if float(temp) >= 13.0:
        """
        If the index of target row is larger than 13, we should change the end of the previous 12
        row to '1.0'
        """
            for i in range(12):
                dictOrig[list(dictOrig.keys())[list(dictOrig.values()).index(temp)]-1-i] = str(float(temp)-i-1)
                """
                list(dictOrig.keys())[list(dictOrig.values()).index(temp)] is the key of the values temp
                """
        if float(temp) < 13.0:
        """
        If the index of target row is smaller than 13, we should change all the previous row to '1.0'
        """
            for i in np.arange(0,float(temp),1.0):
            """
            np.arange() is the range of float number
            """
                dictOrig[list(dictOrig.keys())[list(dictOrig.values()).index(temp)]-1-i] = str(float(temp)-i-1)

    with open('C:/Users/Ethan/PycharmProjects/readcsv/New.csv', 'a', newline='') as newfile:
    """
    Open a new file that we want to write to. Note that we must add the "newline=''" to avoid to generate
    the blank lines. The parameter 'a' means that open for writing, appending to the end of the file if it 
    exists.
    """
        for row in rows:
        """
        Traverse the rows again.
        """
            if rows.index(row) in dictOrig.keys():
            """
            If we find the row we want to change, change it at the time of replication.
            """
                rowNew = row
                rowNew[-1] = '1.0'
                writer = csv.writer(newfile)
                writer.writerow(rowNew)
            else:
            """
            Or we should replicate it directly.
            """
                rowNew = row
                writer = csv.writer(newfile)
                writer.writerow(rowNew)
    newfile.close()
    csvfile.close()
    """
    Close the two files.
    """


 

你可能感兴趣的:(•编程语言)