In this post, I will share a collection of notes and highlights of what I learned from PyTN 2019. The full conference schedule is available online. If you’re interested in seeing the full selection of talks, seeing the actual titles of the talks, or reading about the people who gave the talks, I encourage you to do that. The summary I present below is just my impression and my key takeaways from each of the sessions.
Before I dig into each of the sessions, some important context: I do not spend as much of my work days in Python as I would like to. I spend most of my time working in SQL and jinja on our dbt models and thinking about how we model data, how we improve our team’s workflow, and how we set the organization up for self-serve data through the business intelligence tool we use. Most of my time spent in Python is on nights and weekends, pouring over a computer outside of work hours.
Admittedly, this means I am not moving as quickly as I’d like to. I find the time and energy a couple of times a week to work on this because this is something that is important to me.
I mention this because this context will help you understand why it is I got what I did out of each of these talks. For example, Lynn Root’s talk on asyncio at Spotify is light on notes- not because it wasn’t a good talk, but rather because it was a bit more over my head than some of the others were. The notes that I’ve taken reflect not the talks that were delivered but how I understood them.
- Opening Keynote: Running from Zombies = Agent-Based Modeling
- Time Series Analysis
- Identifying Influencers via Slack
- Choosing Kaizen over a Rewrite
- Python Data Types
- Hands-on NLP Workshop
- Decorator Taxonomy
- Closing Keynote: Where do the old coders go?
I always pick a project that I’m going to hack away on during a conference. I didn’t finish this one yet, but the short version: I tried to do the Storytelling with Data Challenge using my own Goodreads data. Even though I missed the challenge deadline, I’ll be sure to blog about it when it’s done.
Running from Zombies = Agent-Based Modeling
Jackie Kazil is a maintainer of Mesa, an agent-based modeling framework in Python.
From economics, we learn that humans are not rational. Agent-based modeling allows you to create a simulation of what complex agents might do.
Your watch is complicated, but your family is complex.
(I didn’t catch the source of this quote. Google has failed me. If you know where I can credit this, please let me know, and I’ll update this post!)
|Includes human element|
|A pattern to the chaos, examples: birds flocking|
Examples of Agent-Based Modeling in the wild:
In one of the runs of the Wolf Sheep Predation, all the wolves died. I couldn’t help but wonder without wolves, is there a point where the sheep population will start to decline from over-consumption of grass?
The activation (scheduling) of agents, which can occur in sequence, at random, or simultaneously, affects results.
When thinking about agent-based modeling (or, in my opinion, any modeling), it’s important to consider Chesterton’s Fence.
Chesterton’s fence is the principle that reforms should not be made until the reasoning behind the existing state of affairs is understood.
Key Takeaway: Before making a change to a model, let’s consider why it was configured that way in the first place.
Influx DB for Time Series Analysis
Noah Crowley’s talk was more about measuring production data live, as opposed to the sort of time series modeling I usually do. While implementations are different the overall pretenses are the same.
For an app, you might want to create middleware to capture metrics about performance without negatively affecting performance. This can allow for two-way activation of metrics.
Consider abstracting some of the data results, such as:
- Calculating a moving average (simple)
- Controlling for seasonality
Identifying Influencers via Slack
Eva Sasson’s session chronicled a real experience of using Natural Language Processing over a company’s Slack archive to understand which folks were answering the most work-related questions (playing the role of “Company Influencer”) and thus could be termed “knowledge holders.”
Network Analysis considerations:
- Complete network
Dataset was all Slack messages for the company. Usernames changed overtime but User IDs didn’t, emphasizing the importance of familiarizing yourself with the data before working with it.
After doing an adjacency matrix of communications, they found that one node with more centrality than others; it turned out to be a Trello bot. By removing that node from the analysis, they removed all the people who only interacted with that bot.
Question classification process had defined criteria. Used NLP to identify 1. that it was a question, 2. that it was work-related.
In totality, they used:
- Network Analysis
- Graph Theory
- Rule-Based Modeling
- Natural Language Processing
- Unsupervised Machine Learning
Considerations for doing at home:
|Consideration||Example from Talk|
|Use a complete network.||All Slack messages, including DMs|
|Clean your data.||Removing Trellobot|
|Explore your data.||Using User ID instead of username|
|Be careful about GIGO (Garbage In, Garbage Out).|
|Experiment.||Can take multiple iterations to get all questions with appropriate filters.|
Key Takeaway: Complex analyses can be helpful to a business, but sometimes there is no need to overmaster something. You probably could have just asked people Where would you go to get a question answered? to see patterns in this particular case. There are other applications of this analysis though that could be more useful.
Choosing Kaizen over a Master Rewrite
“Kaizen” is the Japanese business process, made popular by Toyota, of constant improvement. Brandon Williams (of Ramsey Solutions!) reminded the audience that rewrite” never go as smoothly as people think they will (and are rarely if ever, completed on plan).
Instead of a rewrite, he suggests 3 steps towards making 1% improvements every day.
- Code jail- isolate existing code into a function so you can test it, but do not edit it.
- Characterization tests- much like Chesterton’s Fence (see above), you don’t know all the functionality that your code might be trying to implement; characterization tests capture all the existing functionality.
- Make changes- refactor vs new feature.
Key Takeaway: Choose 1% improvements over huge rewrites.
This was a little over my head, but here’s what I got:
serial != blocking
async != concurrent
Python Data Types
Casey Faist presented all the way in which Data Types make development easier. Data types can make the code you write much easier to consume (and to write).
Example ways to use data types:
- Communicate state
- Save time
Hands-on Intro to NLP
The most popular spoken language is Mandarin. There are about 7L languages worldwide. Given the volume, we could not possibly map all languages to something a computer can understand because it’d be impossible. (If a computer understands language like a person, that’s Artificial Intelligence.)
Natural Language is what we speak; it’s developed naturally in use.
Natural Language Processing focuses on:
- the interaction between computers + human languages
- how computer interact with users
NLP can be text or speech based; examples include Siri, sentiment analysis, and email spam filters.
“Ghoti” = “Fish”
- “gh” from “tough”
- “o” from “women”
- “ti” from “nation”
Challenges in NLP:
- Ambiguity (Who is “She” referring to?)
- Segmentation (Start and end of a word)
- Idioms (“Stuck between a rock and a hard place”)
- World Knowledge
We worked through this Jupyter Notebook.
This was the best workshop of the entire conference! Grishma was patient, helpful, and incredibly thoughtful throughout her presentation. I would recommend this talk to anyone who is familiar with Python and Data. She made this incredibly complicated subject of Natural Language Processing incredibly approachable!
Andy Fundinger asserts that there are 6 kinds of decorators.
|Type||Example from Talk|
Where do all the old coders go?
In Jesse Jiryu Davis’s Closing Keynote, he presents some powerful stats on how young the tech industry is and continues to be. He suggests that this is for three reasons:
- Age discrimination (get pushed out)
- Tired by skills treadmill
- Lured out by adventure
Age discrimination strikes hardest when on the job market, instead of when on the job.
The skills treadmill is triggered by constantly changing technologies, where you’re good at many but don’t master any because you need to “keep up.”
Adventure can look like many things; to some, it looks like management.
- Learn - new technologies to new depths
- Mentor - new, younger developers, especially those who are underrepresented
- Lead - identify best practices and lead by example without being in management
Here are my key takeaways:
- Before making a change to a model, let’s consider why it was configured that way in the first place.
- Complex analyses can be helpful to a business, but sometimes there is no need to overmaster something.
- Choose 1% improvements over huge rewrites.
- “Ghoti” = “Fish” (just kidding, kinda)
- Learn, Mentor, Lead
PyTN 2019 was an amazing learning opportunity! I have tons that I took away from the conference. I am already counting down until next year.