Friday, April 22, 2022

File processing

Most programming languages have built-in facilities for opening and processing files. In this next set of posts, I'll highlight Python's simple file input/output methods for handling text files.

Most of these facilities can be used for processing input from a variety of sources, such as reading from a URL. And we will do that at the end of this set of posts. For now, let's just get started with a simple set of exercises.

We'll be using the following file:


It's a plain text file, so it can be read by almost all programs that read files. This particular file has 14 lines. Using the VI editor, you can get a line count by typing the following

:set number

or

:set nu

Make sure that you're not in INSERT mode (i.e., the word INSERT does not appear at the bottom left corner of the editor window).

Now onto programming.

Our file is named test.txt. In python, we assign a file variable using the open() function. We will then use this file variable to manipulate the contents of the file, such as reading or writing to the file.

Start your python editor and type in the following. Make sure that you are in the same folder that your file is located.


The file is opened using the following code:

f = open('test.txt', 'r')

f is the file handle, or variable, that we will use to manipulate the file contents.

open() is the function used to open the file. open() takes two arguments, the first is the name of the file you want to open, test.txt. The second is the mode in which you want to open the file. The mode can be:

r = read only
w = read and write (the file is created, if it exists, it's truncated)
a = append (the file is created if it does not exist)

Now that we've opened the file, let's read the first line and display it's contents.















We used the readline() function to read from the file handle f.

We put the contents returned by readline() into the variable line.

Then we used the print() function to display the contents of the variable line.

Note the format for reading from the file into the variable.

line = f.readline()

Now let's read the next line and display it's contents.


















We call readline() again to read the next line.

We use the same variable line to put the contents of the next line.
tabs
Python automatically moved to the next line after we read the first one.

Now let's go to the beginning of the file and use a for loop to read all the lines and display them.


































To go to the start of the file, we use the seek() function. The argument to the seek() function is the position in the file that we want to go to. In our case, we use the argument 0. Which means, to the beginning of the file.

Notice that the file seems bigger than the one we have. This is because the print() function automatically adds a newline after it writes the line to the screen. But each line also has a newline character at the end. So the file looks like it's double-spaced.tabs

We can remove the newline character from the line before we print it to the screen like this.





















The line that removes the newline character from the file is:

line = line.strip()

What this does is remove leading and trailing whitespace. A newline character, or spaces, or tabs at the front or end of the line would be stripped.

Now let's close the file and do some writing exercises.





To open a file for writing, which will also create the file, do the following:




If the file writing-file.txt existed, then python will truncate it. So be careful not to use the name of a file that you already have. For example, if we had used the name test.txt, our file that we used for the reading exercise would have been deleted.

Now let's write a few lines into the new file.






We've written four lines into our new file. Now let's read them.










Wow! What happened? The error message indicates that the file was not opened for us to read. There's a way to open a file for both reading and writing, but the mode we used 'w' was for writing only.

Let's close the file and open it for both reading and writing.


Dictionaries

Python has a dictionary data type.

The dictionary data type stores items as KEY: VALUE pairs. Unlike a list that stores items by their position in the array:

a[0], a[1], a[2], a[3]... 

A dictionary has no concept of order. The programmer determines what the key is, and assigns a value to it.

Consider the following list:












We create an empty list:

l = list()

We then append elements to the list. Recall that the list is ordered from zero.

for x in range(10):
    l.append(x)

The range() function returns a list of integers. In our example, range(10) will return the list:
[0,1,2,3,4,5,6,7,8,9]

Now let's remove the third element (l[2] - since the list is ordered from zero).








Notice how the item at position 2 has gone.

Actually the pop(x) method removes the element which has the value x. In the example above, if there was no "2" in the list, then an error would have been returned. The way to remove the 2-nd element is to use the del method.








In the example above, the element with value "3" is in the index position [2].

The statement:

del l[2]

deletes the "3"

[0, 1, 3, 4, 5, 6, 7, 8, 9]

becomes

[0, 1, 4, 5, 6, 7, 8, 9]

Now let's insert the "3" and the "2" back.









The insert(Pos, Value) function takes two arguments. Pos = the position (zero-based) where we will insert the element, and Value = the value of the element that we're inserting.

And now for the "2"








The function call, l.insert(2,2) will insert the value = 2 as position = 2.

DICTIONARIES are different. They are not based on indexed positions. Each entry in a dictionary has a key that marks where it is. The value is the actual element.

We can create an empty dictionary the following way:







Just like the list() function creates an empty list, the dict() function creates an empty dictionary. We can also create an empty list the following way:

l = []

And we can create an empty dictionary the following way:

d = {}

A dictionary uses curly braces (also known as brackets) to indicate that it's a dictionary.

Now let's add some elements to our empty dictionary. Remember, we need to add a key and an element. Unlike a list that only requires that you specify the element that you want to insert or append. A dictionary has no concept of ordering. We will see later how to sort dictionaries.










Notice a couple of things:

  • The key can be anything. A string or a number.
  • The value can be anything. A string or a number.
  • The key does not have to be related to the value.
A dictionary has an update() method if you want to add more than one key:value pair at once.

Let's see how this works:










Note the statement:

d.update({4: 'four', 5: 'five', 'holiday': 'Kwanzaa', 'lunari': 'X'})

This adds four elements. The keys are:

  • 4
  • 5
  • holiday
  • lunari
The values of each of those keys are:
  • four
  • five
  • Kwanzaa
  • X
It's probably best practice to use the update() method for all inserts so that you get used to it for single or multiple inserts.

Since dictionaries are unordered, to get a list of all the keys, the dictionary provides a keys() method.

This is how it works:





And the dictionary object also has a values() method that retrieves all the values. It works like this:




The keys() method is typically used in a loop to print out all the values. Like this:












The loop iterates through the list. The list is a list of keys. And inside the loop, the following statement runs:

print(key, ': ', d[key])

The print() statement will print a comma-separated list of items and add a newline at the end.