Work with Excel files using Python
Python can be used to do lots of things, from building websites, games and desktop apps. It also has a wide range of practitioners, from application developers, system engineers, hackers (all kinds of hat) and just casual tinkerers. It’s the tinkerers (the digital finance innovators) who are the audience of this information
Tinkering with Python covers a range of tasks; from scraping web pages, adding entries to CSV files, sending email and SMS messages or working with Excel files — which is the topic of this article.
Now, why would you use Python to work with Excel files? Excel is programmable via VBA or .NET, wouldn’t those languages be the “go to” for Excel manipulation? Well, yes, but Python is a lot easier to use than C# (even VBA) and also because Python is cooler. Kidding aside, Python is a good choice (almost all of the time, the superior choice) for working with Excel files because of the sheer number of libraries available for Python and the sheer number of Pythonistas on the web. If you get in trouble working on anything Python, there is no shortage of information that can help you dig yourself out of a hole.
Let’s get cracking then. To work with Excel files, you will need the openpyxl package. You can get this either by easy_install or pip, whichever is your preference. From a command line, you can do it with either
easy_install openpyxl
or
pip openpyxl
Alternatively, if you happen to be using PyCharm (a really good Python editor from JetBrains, they have Community Edition, which is what I’m using). Once you use openpyxl in an import statement and PyCharm senses that you don’t have the package yet, it will offer to download it for you.
Moving on. Once you have openpyxl downloaded, you can start using it, like this (shown in listing 1)
Listing 1. hello_openpyxl.py
from openpyxl import Workbook workbook = Workbook() sheet = workbook.active sheet["A1"] = "hello" sheet["B1"] = "world!" workbook.save(filename="hello_world.xlsx")
The sample code above creates a file called “hello_world.xslx” on the same folder where you’re running the code. Figure 1 shows the resulting Excel file.
Figure 1. hello_world.xlsx
Let’s work with some data. The data I used for the examples here were downloaded from contextures.com, you can download that same data from here https://www.contextures.com/xlSampleData01.html.
We’ll be working with the 2nd sheet of this workbook, here’s what it looks like.
Figure 2. SalesOrders sheet
To read data from an existing Excel file, we can do it with the following code.
from openpyxl import load_workbook wb = load_workbook("sampledata.xlsx")
The load_workbook() function should take care of loading the Excel file into memory. To get to the sheetnames, you can do it with this code
print(wb.sheetnames)
When an Excel file is first loaded to memory, the default worksheet is already selected, and that sheet is usually the first sheet; as shown in the following snippet
sheet = wb.active print(sheet) # prints ["Instructions"]
Our data isn’t on the [‘Instructions’] worksheet (which is the first, which is also the default sheet). So, we need to select the SalesOrders worksheet before we can proceed any further. Selecting a worksheet can be done as follows;
sheet = wb['SalesOrders'] print(sheet) #prints <WorkSheet "SalesOrder">
Now we can start selecting some data. To get the value of a specific cell, we need to specify the cell address and use its value property, like this
print(sheet['A2'].value) prints 2019-01-06 00:00:00
And of course you’re not limited to simply printing these data, you can assign them to variables for later use, like this
# collect the data as a string data = "Rep: %s | Item: %s" % (sheet['C1'].value, sheet['D1'].value)
# then print them print(data)
Reading values from specific cells is already useful, but what’s more useful is that we can iterate through data in the sheets. The following example.
# this goes through the whole set of data for row in sheet.rows: print(row[2].value, row[3].value, row[4].value)
This targets the data as shown in figure 3.
Figure 3. Selected columns
The previous code snippet goes through the entire records but selects only columns “Rep”, “Items” and “Units” — it’s zero-based, so if you’d like to print the OrderDate, you should choose row[0] instead (column A of the worksheet).
We can also use Python’s generators to walk through specific sets of data. For our example, let’s use the iter_rows() function. To produce the same result as our previous code snippet, we can also write the following code
for data in sheet.iter_rows(min_col=3, max_col=5, min_row=2 values_only=True): print(data)
The iter_rows() function takes the following parameters:
- min_col. the starting column you’d like to get
- max_col. The ending column you’d like to get
- min_row. The starting row you’d like to get. If you don’ specify this, you’ll get the very first row, which is the header. I didn’t want that, so, I specified min_row=2
- max_row. The ending row you’d like to get. If you don’t specify this, the loop will walk up until the last row. Which is what I wanted, so, I didn’t pass this parameter
- values_only. Set this true if you want to get to the actual value, otherwise, you’ll get the cell address, not the value
Our last stop is how to append data to a sheet. The task requires that you 1) open the file using load_workbook() function, like we did in the other examples 2) select the cell you’d like to write to 3) Set the value of that cell and finally 4) Save the spreadsheet using the .save() function of the workbook. The following code snippet shows how it’s done.
from openpyxl import load_workbook # open the spreadsheet and selecting the main sheet wb = load_workbook(filename="hello_world.xlsx") sheet = wb.active # Write what you want into a specific cell sheet["C1"] = "Write exercise ;)" # Save the spreadsheet wb.save(filename="hello_world.xlsx")
Sources:
https://www.knowledgehut.com/blog/programming/how-to-work-with-excel-using-python
https://realpython.com/openpyxl-excel-spreadsheets-python/
https://automatetheboringstuff.com/chapter12/
https://xlsxwriter.readthedocs.io/getting_started.html
https://stackoverflow.com/questions/49271943/how-to-print-all-column-values-from-excel-using-python
Where I downloaded the data from:
https://www.contextures.com/xlSampleData01.html
Assuming it’s Python 3.
Install openpyxl, either you install using pip, easy_install or if you are using PyCharm, the IDE would offer to install it for you automatically.
Great openpyxl example: https://realpython.com/openpyxl-excel-spreadsheets-python/
Responses