TL;DR:
Python is a (powerful) general purpose language in broad use, let's dive in and
learn some control flow tips, standard library tricks, and common pitfalls.
1 Introduction
Python (and its libraries) are enormous. It is used for system automation, web
applications, big data, analytics, and security software. This article aims to
show off some lesser-known tricks to put you on the path to faster development,
easier debugging, and general fun.
As with every language, the real resource you get once you learn it isn't a
language-related superpower. It's the ability to use the idioms, libraries, and
shared knowledge of the Python community.
Exploring Standard Data Types
The Humble enumerate
Iterating over the contents of anything in Python is simple, just
for foo in
bar:
and you're off and running.drinks = ["coffee", "tea", "milk", "water"]
for drink in drinks:
print("thirsty for", drink)
#thirsty for coffee
#thirsty for tea
#thirsty for milk
#thirsty for water
But it's common to also want the index of items as well as the items
themselves. It's common to see programmers use
len()
and range()
to iterate
over a list by index, but there's an easier way.drinks = ["coffee", "tea", "milk", "water"]
for index, drink in enumerate(drinks):
print("Item {} is {}".format(index, drink))
#Item 0 is coffee
#Item 1 is tea
#Item 2 is milk
#Item 3 is water
The enumerate
builtin yields both the index and the item itself.
A member of set
A surprising number of concepts can boil down to operations on a set. Need to
make sure a list doesn't has duplicates? Need to see what two lists have in
common? Python comes with a set
type to make these operations fast and
readable.# deduplicate a list *fast*
print(set(["ham", "eggs", "bacon", "ham"]))
# {'bacon', 'eggs', 'ham'}
# compare lists to find differences/similarities
# {} without "key":"value" pairs makes a set
menu = {"pancakes", "ham", "eggs", "bacon"}
new_menu = {"coffee", "ham", "eggs", "bacon", "bagels"}
new_items = new_menu.difference(menu)
print("Try our new", ", ".join(new_items))
# Try our new bagels, coffee
discontinued_items = menu.difference(new_menu)
print("Sorry, we no longer have", ", ".join(discontinued_items))
# Sorry, we no longer have pancakes
old_items = new_menu.intersection(menu)
print("Or get the same old", ", ".join(old_items))
# Or get the same old eggs, bacon, ham
full_menu = new_menu.union(menu)
print("At one time or another, we've served:", ", ".join(full_menu))
# At one time or another, we've served: coffee, ham, pancakes, bagels, bacon, eggs
The intersection
function compares all the items and returns only the items
both sets have in common. In this case, the breakfast staples of bacon, eggs,
and ham.collections.namedtuple
When you don't need to attach methods to a class, but still want the convenience offoo.prop
, look no further than namedtuple. You define the
fields ahead of time, then can instantiate a lightweight class that takes less
memory than a full object.LightObject = namedtuple('LightObject', ['shortname', 'otherprop'])
m = LightObject()
m.shortname = 'athing'
> Traceback (most recent call last):
> AttributeError: can't set attribute
You can't set attributes of a
namedtuple
, just like you can't change members
of a tuple. You need to set attributes when you instantiate your namedtuple
.LightObject = namedtuple('LightObject', ['shortname', 'otherprop'])
n = LightObject(shortname='something', otherprop='something else')
n.shortname # something
collections.defaultdict
It's not uncommon to see logic like this in a Python app, where it's expected that a key won't exist initially.login_times = {}
for t in logins:
if login_times.get(t.username, None):
login_times[t.username].append(t.datetime)
else:
login_times[t.username] = [t.datetime]
With
defaultdict
you can skip this logic by making any access to an undefined
key return an empty list (or any other type).login_times = collections.defaultdict(list)
for t in logins:
login_times[t.username].append(t.datetime)
You can even use custom classes, given a callable to build the class.
from datetime import datetime
class Event(object):
def __init__(self, t=None):
if t is None:
self.time = datetime.now()
else:
self.time = t
events = collections.defaultdict(Event)
for e in user_events:
print(events[e.name].time)
To go beyond what defaultdict offers and to set nested keys as attributes,
check out addictnormal_dict = {
'a': {
'b': {
'c': {
'd': {
'e': 'really really nested dict'
}
}
}
}
}
from addict import Dict
addicted = Dict()
addicted.a.b.c.d.e = 'really really nested'
print(addicted)
# {'a': {'b': {'c': {'d': {'e': 'really really nested'}}}}}
This snippet is way easier to write than it would be with the standard
dict
, but what about defaultdict
? Seems like it would be easy enough.from collections import defaultdict
default = defaultdict(dict)
default['a']['b']['c']['d']['e'] = 'really really nested dict' # fails
That looks ok, but it will actually throw a KeyError
exception because
default['a']
is a dict
, not a defaultdict
. Let's make a defaultdict that
defaults to defaulted dictionaries (say that a couple times fast).If you just need a defaulted counter, you can use the collections.Counter class which provides some convenience functions like
most_common
.Control Flow
When learning control structures in Python, it's common to go overfor
,
while
, if-elif-else
, and try-except
. Properly used, those few control
structures can handle most every case. There's a reason equivalents exist in
almost every language you run across. Python also offers some additions to the
basic structures that aren't often used, but can make your code more readable
and easier to maintain.Great Exceptations
Exceptions as flow control is a common pattern when dealing with databases, sockets, files, or any resource that is likely to fail. With the standardtry
and except
something simple like working with a database might look like
this.try:
# get API data
data = db.find(id='foo') # may raise exception
# manipulate the data
db.add(data)
# save it again
db.commit() # may raise exception
except Exception:
# log the failure
db.rollback()
db.close()
Can you spot the problem here? There are two possible exceptions that will
trigger the same except
block. Meaning that failure to find the data (or to
connect to find the data) would cause a rollback attempt. This almost
definitely isn't what we want, because a failure at that point wouldn't have
even begun a transaction yet. A rollback also probably isn't the right response
to a connection failure, so let's break these cases apart.First, we'll handle finding the data.
try:
# get API data
data = db.find(id='foo') # may raise exception
except Exception:
# log the failure and bail out
log.warn("Could not retrieve FOO")
return
# manipulate the data
db.add(data)
Now that the data retrieval has its own try-except we can take whatever action
makes sense if we don't have any data to work with. It's not likely our code
will do anything useful without data, so we'll just exit the function. Instead
of exiting you could also make a default object, retry the query, or kill the
entire program.Now let's wrap the
commit
so it fails gracefully as well.try:
db.commit() # may raise exception
except Exception:
log.warn("Failure committing transaction, rolling back")
db.rollback()
else:
log.info("Saved the new FOO")
finally:
db.close()
We've actually added two clauses here. First, let's look at the else
, which
runs if no exception occurs. In our example, all it does is log that the
transaction succeeded, but you could put more interesting actions in as needed.
One potential use would be to fire off a background job or notification.The
finally
clause is there to make it clear that the db.close()
will
always run. Looking back, we can see that all the code related to persisting
our data ended up in a nice logical grouping at the same indentation level.
Editing this code later, it will be easy for us to see that all these lines are
tied to the commit
.Context and Control
We've seen control flow using exceptions before. In general, the steps are something like:- Attempt to acquire a resource (file, network connection, whatever)
- If it fails, clean up anything left behind
- Otherwise, perform actions on the resource
- Log what happened
- Program complete
try:
# attempt to acquire a resource
db.commit()
except Exception:
# If it fails, clean up anything left behind
log.warn("Failure committing transaction, rolling back")
db.rollback()
else:
# If it works, perform actions
# In this case, we just log success
log.info("Saved the new FOO")
finally:
# Clean up
db.close()
# Program complete
Our previous example mapped to the steps above almost exactly. But how much of
this logic ever changes? Not very much.Just about every time we save data, we'll do these exact same steps. We could pull this logic into a method, or we could use a context manager.
db = db_library.connect("fakesql://")
# as a function
commit_or_rollback(db)
# context manager
with transaction("fakesql://") as db:
# retrieve data here
# modify data here
A context manager makes it easy to protect some block by setting up resources
(context) that the block needs at runtime. In our example, we need a database
transaction that will be:- Connected to a database
- Started at the beginning of the block
- Committed or rolled back at the end of the block
- Cleaned up at the end of the block
contextmanager
interface is simple. The object is required to have a
__enter__()
method to set up whatever context is needed and a
__exit__(exc_type, exc_val, exc_tb)
method that will be called at the end of
the block. If there was no exception, then all three of the exc_*
arguments
will be None
.The
__enter__
method will be pretty simple, so let's start with that.class DatabaseTransaction(object):
def __init__(self, connection_info):
self.conn = db_library.connect(connection_info)
def __enter__(self):
return self.conn
The __enter__
method actually does nothing except return the database
connection, which we can use inside the block to retrieve or save data. The
__init__
method is where the connection is actually made, and if it fails the
block won't run at all.Now let's define how the transaction will be finished in the
__exit__
method.
This has a lot more to it, since it has to handle any exceptions thrown in the
block and close out the transaction. def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is not None:
self.conn.rollback()
try:
self.conn.commit()
except Exception:
self.conn.rollback()
finally:
self.conn.close()
Now we can use our DatabaseTransaction
as the context manager for our block
of actions. Under the hood, the __enter__
and __exit__
methods will run and
handle setting up the database connection and tear it down when we're through.# context manager
with DatabaseTransaction("fakesql://") as db:
# retrieve data here
# modify data here
To improve our (primitive) transaction manager, we could add handling for
different exception types. Even in its current state, this hides a ton of
complexity that you don't need to be worrying about every time you pull in
something from the database.Generators
Introduced in Python 2, generators are a simple way to implement an iterator that doesn't hold all its values at once. Typically a function in Python starts its execution, does some operations, and returns the result (or nothing).Generators are different.
def my_generator(v):
yield 'first ' + v
yield 'second ' + v
yield 'third ' + v
print(my_generator('thing'))
# <generator object my_generator at 0x....>
Instead of return
we use the yield
keyword, which is what makes a generator
special. When calling my_generator('thing')
instead of getting the result of
the function we get a generator object, which can be used anywhere you could
use a list or other iterable.Most often, you'll use generators as part of a loop as below. The loop will continue until the generator stops
yield
ing values.for value in my_generator('thing'):
print value
# first thing
# second thing
# third thing
gen = my_generator('thing')
next(gen)
# 'first thing'
next(gen)
# 'second thing'
next(gen)
# 'third thing'
next(gen)
# raises StopIteration exception
After being instantiated, a generator doesn't do anything until it is asked for
a value. It will execute until the first yield
and pass that value to the
caller, then wait (saving its state) until another value is requested.Now let's make a generator that's a bit more useful than just giving back 3 hard-coded items. The classic generator example is an endless fibonacci generator, so let's give that a try. It will start at 1 and give the sum of the prior two numbers for as long as you ask it to.
def fib_generator():
a = 0
b = 1
while True:
yield a
a, b = b, a + b
A while True
loop in a function would normally be something to avoid because
the function would never return, but for a generator it's fine as long as
there's a yield
in the loop. We do need to be careful to have an end
condition when we use this generator, because it will happily add numbers
forever.Now let's use our generator to calculate the first fibonacci number that's greater than 10,000.
min = 10000
for number in fib_generator():
if number > min:
print(number, "is the first fibonacci number over", min)
break
That was pretty easy, and we can make that number as large as we want and it
will still (eventually) come up with the first number larger than X in the
fibonacci sequence.Let's try out a more practical example. Paginating APIs is common practice to limit usage and avoid sending 50 megabytes of JSON (!!!) to a mobile device. First, we'll define the API we're using and then we'll write a generator around it to hide the paging from our code.
The API we're using is called Scream, a place where users can argue about restaurants they've eaten at or want to eat at. Their API for searching is pretty simple, and looks like this.
GET http://scream-about-food.com/search?q=coffee
{
"results": [
{"name": "Coffee Spot",
"screams": 99
},
{"name": "Corner Coffee",
"screams": 403
},
{"name": "Coffee Moose",
"screams": 31
},
{...}
]
"more": true,
"_next": "http://scream-about-food.com/search?q=coffee?p=2"
}
Neat! They embedded the link to the next page in the API response so it'll be
extremely easy to get the next page when it's time. We can also leave off the
page number to just get the first page. To get the data, we'll use the
always-handy requests library and wrap it in a generator to display
our search results.
The generator will handle pagination and have limited retry logic, and will
work something like:- Receive search term
- Query the scream-about-food API
- Try again if the API fails
- Yield the results from the page it gets one at a time
- Get the next page if there is one
- Exit when there are no more results
import requests
api_url = "http://scream-about-food.com/search?q={term}"
def infinite_search(term):
url = api_url.format(term)
while True:
data = requests.get(url).json()
for place in data['results']:
yield place
# end if we've gone through all the results
if not data['more']: break
url = data['_next']
When you create a generator, you only need to pass in search terms and the
generator will build the query and get results as long as they exist. There are
(of course) some rough edges here. Exceptions aren't handled at all, and if the
API fails or returns unexpected JSON the generator will raise an exception.Despite these rough spots, we can still use it to find out what number our restaurant is in the search results for the term "coffee".
# pass a number to start at as the second argument if you don't want
# zero-indexing
for number, result in enumerate(infinite_search("coffee"), 1):
if result['name'] == "The Coffee Stain":
print("Our restaurant, The Coffee Stain is number ", number)
return
print("Our restaurant, The Coffee Stain didnt't show up at all! :(")
The generator handles iterating over each page of search results, so all we
have to do is use the
As an exercise, go ahead and add a counter to the enumerate
builtin from earlier in the article to keep
track of the number of results and print them when we find our shop.infinite_search
generator
so we can write code like this instead.for result in infinite_search("coffee"):
if result['name'] == "The Coffee Stain":
print("Our restaurant, The Coffee Stain is number ", result['number'])
return
print("Our restaurant, The Coffee Stain didn't show up at all! :(")
If you write Python 3, you already use generators when you use the standard
library. Calls like dict.items()
now return generators instead of lists. To
get this behavior in Python 2 dict.iteritems()
was added, but isn't as
frequently used.Python 2 and 3 compatibility
Moving from Python 2 to Python 3 can be an undertaking for any codebase (or any
developer) but it's possible to write code that runs in both. Support for
Python 2.7 will continue until 2020, but it's unlikely that many new features
will be backported. For now, it's recommended to support Python 2.7 and 3+
unless it's feasible for you to drop Python 2 support entirely.
For a comprehensive guide on supporting both versions, see the
Porting Python 2 Code guide
from python.org.Let's look over the most common things you'll run into when trying to write compatible code, and how to use
__future__
to work around them.print or print()
Just about every developer who has switched from Python 2 to 3 has typed the wrongprint
statement. Fortunately, you can standardize on using print as a
function (Python 3 style) instead of a keyword by just importing
print_function
.print "hello" # Python 2
print("hello") # Python 3
from __future__ import print_function
print("hello") # Python 2
print("hello") # Python 3
Divided Over Division
The default behavior for division in Python also changed between 2 and 3. In
Python 2, dividing integers would perform integer-only division, chopping off
any trailing decimals. This wasn't what most users expected, so it was changed
in Python 3 to use floating point division even when dividing integers.
print(1 / 3) # Python 2
# 0
print(1 / 3) # Python 3
# 0.3333333333333333
print(1 // 3) # Python 3
# 0
This sort of behavior change brings in a bunch of subtle bugs when writing code
to run in both major versions. Again, we're saved by the
__future__
module.
Importing division
makes these behaviors identical in both versions.from __future__ import division
print(1 / 3)# Python 2
# 0.3333333333333333
print(1 // 3)# Python 2
# 0
print(1 / 3) # Python 3
# 0.3333333333333333
print(1 // 3)# Python 3
# 0
Fin - Thanks for Reading
Thanks for reading, I hope you learned at least one thing. If you have
something to add (or correct, no writer is perfect) I'll be checking the
comments section frequently. If you enjoyed this article, you might want to
check out this one on
list
and dict
comprehensions or
a more in-depth treatment of Python 2 and 3
Thanks to commenters dalke (on HackerNews), György Kiss, mikemikemikemikemike,
Karl-Aksel Puulmann, Bartłomiej "furas" Burek, and Peter Venable for
finding errors and omissions in this article.
Post a Comment