How to read a file in Python | plain txt
If you are wondering how to read a file in Python you’re in a right place. You might think that it is difficult but Python actually has built-in functions that are fairly intuitive. Before we get to reading files in Pandas let’s browse through even more basic options.
The easiest way to start is to assign the file to a variable and use a basic
open() function. We’ll use
mode='r' to signify that we only want to read the file and not update it.
# Open a file file = open('reading_simple_filessomething.txt', mode='r') # Print it print(file.read()) # Check whether file is closed print(file.closed) # Close file file.close() # Check whether file is closed print(file.closed)
When using this method you do not close a file automatically. You have to use a
close() function separately. And it is important that you do close the file once you’re done with it.
If your text file is a bit bigger you can read it line by line. Just use
readline() like in the example below.
# Read & print the first 3 lines with open('reading_simple_filessth_bigger.txt') as file: print(file.readline()) print(file.readline()) print(file.readline())
Notice that here we don’t have to close the file. Using
with (so what one would call a context manager in Python) it will be closed automatically once we go out of the context.
How to read a file in Python | tabular csv
As a data analyst, you will encounter tabular data more often than just plain text files. A CSV file is an example of a flat file. So, in short, we have all in one big table. No relations like databases. Just one big ass table (ok, sometimes those files can be small too ).
There are plenty of ways to read a CSV file in Python. One of them is to use Numpy and its
recfromcsv(). Let’s use it now to read the data on Titanic’s passengers.
# Read csv with Numpy import numpy as np # Assign the filename file = 'reading_simple_files\titanic.csv' # Import file using np.recfromcsv: d d = np.recfromcsv(file, encoding=None) # Print out first two records of d print(d[:2])
Most of the default arguments of
recfromcsv() is set such that it reads the CSV files with no problems. For example, a delimiter is set to a comma by default. To avoid warnings I also specified encoding as None. The result here would be something you’ll hear often about – the famous Numpy array.
Oh, wait – the double backslash? Well, if I had left only one, Python would have read it (t) as a tabulator. Another backslash let me exit that special character – the backslash itself.
As for the
recfromcsv(), you can learn more about its arguments by checking its more general-purpose version –
genfromtxt(). Check the documentation here.
Reading files in Pandas
Ah, Pandas. The place where we all begin and where most of us end. Reading CSV file in Pandas is absurdly easy. Check this out:
# Import pandas as pd import pandas as pd # Read the file into a DataFrame: df df = pd.read_csv('reading_simple_files\titanic.csv') # View the head of the DataFrame print(df.head())
What you get as a result is a structure called DataFrame (it’s a close relative of R’s data frame). It’s perfect for reading columns with different data types, handling missing values, and, most importantly – allowing advanced data analysis that includes joins, merging, statistics – you name it.
Of course, Pandas has all sorts of cool parameters you can set like how to treat headers, missing values, comments, and much more. Be sure to check Panda’s documentation on that.
I hope you now know how to read a file in Python. Or at least know a little more about it. You can find code examples mentioned in the article on my GitHub repo.
Peace Data Friend!
Hey! Yeah – you, you freaky nerd Did you like the article?
Share it on social media!