String extraction from iterating and mixed lines in python

ساخت وبلاگ
چکیده : Vote count: 0 I have a dataset as belo... با عنوان : String extraction from iterating and mixed lines in python بخوانید :

Vote count: 0

I have a dataset as below;

"birth_date_1:25        birth_date_2:august     birth_date_3:1945    birth_place_1:france   death_date:   "
"birth_date_1:14        birth_date_2:june       birth_date_3:1995   birth_place_1:dvůr     birth_place_2:králové     birth_place_3:nad       birth_place_4:labem     birth_place_5:,     birth_place_6:czech     birth_place_7:republic  "
"birth_date_1:21        birth_date_2:february       birth_date_3:1869   birth_place_1:blackburn     birth_place_2:,     birth_place_3:england   death_date_1:12     death_date_2:march      death_date_3:1917   "
"birth_date_1:07        birth_date_2:july       birth_date_3:1979   birth_place_1:ghana     birth_place_2:,     birth_place_3:accra "
"birth_date_1:27        birth_date_2:february       birth_date_3:1979   birth_place_1:durban        birth_place_2:,     birth_place_3:south     birth_place_4:africa    "
"birth_date_1:1989  birth_place_1:lima      birth_place_2:,     birth_place_3:peru  "
"birth_date_1:5     birth_date_2:september      birth_date_3:1980   birth_place_1:angola    death_date:   "
"birth_date_1:1     birth_date_2:february       birth_date_3:1856   birth_place_1:hampstead     birth_place_2:,     birth_place_3:london    death_date_1:14     death_date_2:august     death_date_3:1905   "
"birth_date_1:28        birth_date_2:december       birth_date_3:1954   birth_place_1:hickory       birth_place_2:,     birth_place_3:north     birth_place_4:carolina  death_date:   "
"birth_date:  "
"birth_date:  birth_place:  death_date:   "
"birth_date:  birth_place_1:belfast       birth_place_2:,     birth_place_3:northern      birth_place_4:ireland   "
"birth_date:  birth_place:  death_date:   "
"birth_date_1:28        birth_date_2:february       birth_date_3:1891   birth_place_1:carberry      birth_place_2:,     birth_place_3:manitoba  death_date_1:20     death_date_2:september      death_date_3:1968   "
"birth_date_1:4     birth_date_2:november       birth_date_3:1993   birth_place_1:portim√£o     birth_place_2:,     birth_place_3:portugal  "

Within these dataset I am trying to extract information as below;

25.08.1945 t France t NA
14.06.1995 t Dvůr Králové nad Labem,Czech Republic t 
21.02.1896 t Blackburn,England t 12.03.1917
.
.
.
1989 t Lima,Peru t NA
.
.
.
NA t NA t NA
NA t NA t NA
NA t Belfast, Northern Ireland t NA
.
.
04.11.1993 t Portimeo,Portugal t NA

I wrote below code to achieve this however because of the several scenarios I will encounter in my dataset such as birth_date_1 information can be null, a month name or a year, the loop below I came up with feels like is going to fail somewhere and won't be feasible.

    outputfile = open('ornek_box_seperated_update.csv','w',encoding="utf-8")
    inputfile = open('ornek_box_seperated.csv','r',encoding="utf-8")
    import numpy as np

    birthDatePlace = [[ np.nan for i in range(9) ] for j in range(20000)]

    for line in inputfile:
        d = line.split(":")
        print(d)
        d = line.split(d)
        d = "t".join(d)
        print(d)
        if(d[1]<40 and d[1]>0):
            birthDatePlace[line,1] = d[1]
        elif(d[1]<2020):
            birthDatePlace[line,3] = d[1]
        if(d[1]<40 and d[1]>0 and isinstance(d[3])==str):
            birthDatePlace[line,2] = d[3]
        elif(d[1]<2020 and isinstance(d[3])==int):
            birthDatePlace[line,4] = d[3]

        # this code planned to continue from here until cover the all birth place and death date information in required format

        outputfile.write(d)
        outputfile.write('n')
    outputfile.close()

I appreciate any help you can provide. I am kinda newbie in python and especially in regex or string extraction methodologies.

Thank you in advance for your kind support.

asked 27 secs ago
Kaan Karabal

back soft...
ما را در سایت back soft دنبال می کنید

نویسنده : استخدام کار بازدید : 24 تاريخ : سه شنبه 9 مرداد 1397 ساعت: 4:34