Regex and For Loop in Python

In this blog article, I will introduce the way for Python to parse the file by using regular expression to deal with the data in a text file.

1 = hoge1
2 = hoge2
3 = foo1
4 = foo2
5 = foo3

Let’s image this data in stored in a text file data.txt and the Python script that I’m going to write will read the text file and loop through each line and then turn the raw data into a dictionary data.

import re

if __name__ == '__main__':
    data_file = '/Users/hiriumi/tmp/data.txt'
    f = open(data_file, 'r')

    data = f.read()
    f.close()

    data_parse_re = re.compile(r'(\d)\s*=\s*(.+)')
    dict_data = {} # create an empty dict object
    for num, val in data_parse_re.findall(data):
        dict_data[num] = val

    print(dict_data)

Result below.

{'1': 'hoge1', '2': 'hoge2', '3': 'foo1', '4': 'foo2', '5': 'foo3'}

You might wonder where do num and val come from in the for loop? If you take a look at the regex, it has 2 sets of parenthesis. num maps to the result of the match of within the first parenthesis which is \d. It means a number in this case. val maps to the match of the regex of the second parenthesis which is .+. It means one or more (+) of any character (.).

If you have 3 sets of parenthesis, you can add another variable in the for loop to extract the value.

Let’s take it one step further. If you want a zero based index, you could wrap the result of findall() method with enumerate(). Here is the example.

import re

if __name__ == '__main__':
    data_file = '/Users/hiriumi/tmp/data.txt'
    f = open(data_file, 'r')

    data = f.read()
    f.close()

    data_parse_re = re.compile(r'(\d)\s*=\s*(.+)')
    for num, item in enumerate(data_parse_re.findall(data)):
        print(num, item[0], item[1])

The content of item is a tuple. The first element maps to the match of the first element and the second element maps to the match of the second element. It goes without saying that if there was a third set of parenthesis, it would put the value to the third item in the tuple. This way, you don’t need to maintain another variable and increment it for each iteration of the loop.

I find this quite useful when I have to parse logs. You might struggle with it but I believe it is very important to be adequate at regular expression as a software engineer.

Author: admin

A software engineer in greater Seattle area

Leave a Reply

Your email address will not be published. Required fields are marked *