How to read a file in Python | plain txt
If you are wondering how to read a file in Python you’re in a right place. You might think that it is difficult but Python actually has built-in functions that are fairly intuitive. Before we get to reading files in Pandas let’s browse through even more basic options.
The easiest way to start is to assign the file to a variable and use a basic open()
function. We’ll use mode='r'
to signify that we only want to read the file and not update it.
# Open a file
file = open('reading_simple_filessomething.txt', mode='r')
# Print it
print(file.read())
# Check whether file is closed
print(file.closed)
# Close file
file.close()
# Check whether file is closed
print(file.closed)
When using this method you do not close a file automatically. You have to use a close()
function separately. And it is important that you do close the file once you’re done with it.
If your text file is a bit bigger you can read it line by line. Just use readline()
like in the example below.
# Read & print the first 3 lines
with open('reading_simple_filessth_bigger.txt') as file:
print(file.readline())
print(file.readline())
print(file.readline())
Notice that here we don’t have to close the file. Using with
(so what one would call a context manager in Python) it will be closed automatically once we go out of the context.
How to read a file in Python | tabular csv
As a data analyst, you will encounter tabular data more often than just plain text files. A CSV file is an example of a flat file. So, in short, we have all in one big table. No relations like databases. Just one big ass table (ok, sometimes those files can be small too ).
There are plenty of ways to read a CSV file in Python. One of them is to use Numpy and its recfromcsv()
. Let’s use it now to read the data on Titanic’s passengers.
# Read csv with Numpy
import numpy as np
# Assign the filename
file = 'reading_simple_files\titanic.csv'
# Import file using np.recfromcsv: d
d = np.recfromcsv(file, encoding=None)
# Print out first two records of d
print(d[:2])
Most of the default arguments of recfromcsv()
is set such that it reads the CSV files with no problems. For example, a delimiter is set to a comma by default. To avoid warnings I also specified encoding as None. The result here would be something you’ll hear often about – the famous Numpy array.
Oh, wait – the double backslash? Well, if I had left only one, Python would have read it (t) as a tabulator. Another backslash let me exit that special character – the backslash itself.
As for the recfromcsv()
, you can learn more about its arguments by checking its more general-purpose version – genfromtxt()
. Check the documentation here.
Reading files in Pandas
Ah, Pandas. The place where we all begin and where most of us end. Reading CSV file in Pandas is absurdly easy. Check this out:
# Import pandas as pd
import pandas as pd
# Read the file into a DataFrame: df
df = pd.read_csv('reading_simple_files\titanic.csv')
# View the head of the DataFrame
print(df.head())
What you get as a result is a structure called DataFrame (it’s a close relative of R’s data frame). It’s perfect for reading columns with different data types, handling missing values, and, most importantly – allowing advanced data analysis that includes joins, merging, statistics – you name it.
Of course, Pandas has all sorts of cool parameters you can set like how to treat headers, missing values, comments, and much more. Be sure to check Panda’s documentation on that.
I hope you now know how to read a file in Python. Or at least know a little more about it. You can find code examples mentioned in the article on my GitHub repo.
Peace Data Friend!
Why not check out other articles on Python?
Or maybe – start fresh from the home page, see what KajoData is all about!
Hey! Yeah – you, you freaky nerd Did you like the article?
Share it on social media!
- >>> share it on LinkedIn, show that you learn something new every day
- >>> btw, why not link with me on LinkedIn, let’s get to know each other
- >>> put it on Facebook, you can help out a friend of yours
- >>> bookmark this page, it can be helpful in the future