### Doug Wade
Platforms team
[doug.wade@redfin.com](mailto:doug.wade@redfin.com)
Aside:
I'm mostly a javascriptista, but Python was the first language I liked
I've been at Redfin almost a year
I work mostly on front end build, especially performance
Python
Redfin Official Scripting Language
nagios alerts
endpoint testing
data imports
walk score deployments
These are smaller uses; just emphasizing Python's ubiquity
Our nagios alerts are a subset used to monitor Jenkins
Our test team uses python to run performance tests against the site
The data team uses python for listings imports, photos
walk score uses boto to identify aws hosts and multiprocessing to manage pools
# Agenda
- Build images: dirpy
- Build: Bazel
- Deploy: Fabric
- Test: ETL Testing
- Test: REF
Aside:
We're currently working on creating a continuous deployment pipeline
Many parts of the are written in Python
Our Python is rapidly growing
dirpy
Server-side image cropping + resizing
Resizes arbitrary images from MLS to fit standard sizes
Handles 200 images a second in prod
Resizes a single image in 6 - 8 ms
built with Pillow
Resized 156 images a second on a reference vm with 4 cores and 4GB memory.
Scales to 2000 imgs/sec; we run at 200 imgs/sec in prod
Planning on open-sourcing it
Bazel
Fast, reproducible build system
Written in a subset of Python
Builds api server
Builds distibution tarballs
Adding building node modules
Written at Google, uses hermeticism to guarantee reproducability, and a distributed artifact cache for performance
This is a big part of our move to continuous deployment, to get builds under 6 minutes
We've struggled with our module builds because of npm violations of bazel assumptions
Bazel
def _external_npm_module(ctx):
return struct(
internal = False,
transitive_internal_deps = [],
transitive_external_deps = [ctx.attr.raw_target] + ctx.attr.runtime_deps
)
external_npm_module = rule(
implementation = _external_npm_module,
attrs = {
"raw_target": attr.label(allow_files = True),
"runtime_deps": attr.label_list(allow_files = True)
},
)
This is an example rule -- the implementation is the function; the rule is the build script function invocation
Note that skylark is a subset to enforce hermeticism
Fabric
streamlines deploy and sysadmin tasks
Manages: servers - databases - solr - ci fleet - deployments
Used almost exclusively for deployments
Is managed by a Google doc that is translated to xml and then updates the google doc with timing data
Manage everything except a small subset of ops boxes that aren't deployed to
Fabric
from fabric.api import *
import utils.configuration
from utils.deploylogger import log_run
import time
import os
'''Start apache on each server.'''
def do_work(config, silent_mode=False, args=None):
retstring, retval = log_run(config, 'apachectl start', args)
if retval:
return (retstring, retval)
return ('Success', 0)
This is an example rule -- all methods are named do_work and operate on each host
We define groups of hosts that can be operated on, or single boxes
It helped us coordinate deploys when developers managed deploy steps individually
ETL Testing
Black box verification of ETL pipelines
defines test cases
manages creation and deletion of source data
manages execution and failure detection of the pipeline
is extensible to new sources/destinations
built with py.test and Spring Python
We use it for the analytics cd pipeline to make sure changes are good
Uses an existing EC2 instance to grab data from an S3 bucket and put it in Redshift
Uses text files to test loads into our PostgreSql instances
Does regression testing for the data team
ETL Testing
trivial assertion
This is a very simple test case that only asserts that 2+2 = 4
In real tests, you provide source, dest, and a pipeline
REF
Redfin Experiments Framework
performs A/B testing
integrates with our feature toggle, Bouncer
used to test major features before release
counts events from Redshift table and generates static site
Something extra -- try to cut this
Probably not the best example, but handles millions of events daily
Questions?
This couldn't possibly work, could it? It can't be that easy, can it? But it works!
- Dan Fabulich
Slides: http://redfin.github.io/slides/python-at-redfin/
Mostly, our experience in Python has been: it shouldn't be this easy, but it is
Dan / jlo's story about starting a thread / process pool