My 10 Favorite Posts on Data (and Management) of 2021
I’ve sent nearly 40 newsletters in 2021, which together include hundreds of links to things that interested me at different points of the year. This is a quick summary of the things that stand out in my mind as favorites, now that the year is almost over.
To be possibly pedantic, ‘favorite’ doesn’t mean that I agreed with everything in any post or video: many of these have aspects that I don’t understand, or strongly disagree with. But they have stuck with me and influenced my thinking. I hope that by highlighting them here, I’ll be able to find them more easily in 2030 when I’m thinking about what influenced me over the coming decade.
While the newsletter and many of these posts are on the topic of ‘data’, I spend more time thinking about management and organizational design in my current role. That shifting of interests is definitely reflected here.
These posts are in chronological order, not ranked.
Designing Engineering Organizations (Jacob Kaplan-Moss, January 8th)
In my first newsletter of the year, I linked to a piece that made me feel really good about some of the organizational changes that my team had just made a few months before. Here’s the punchline to the piece:
[T]he most effective teams are stable, multi-disciplinary, aligned to product delivery.
A year on, I have somewhat mixed feelings here. I really enjoy Kaplan-Moss’s writing, especially how tight the focus of his pieces are. But I took the time to read Team Topologies last year as well, and I think I lean towards Skelton & Pais’s view of organizational structure.
One thing that has annoyed me about engineering organizational writing is that the authors (in general) seem much more concerned about teams being too big than being too small. Maybe it’s just a sign of the times and competitive job market, but I worry much more about silos, burnout from constant on-calls, and overlapping responsibilities between teams than I worry about having ‘too many people [to put on a deliverable] to form a single team’. Sounds like a nice problem to have!
Let Them Log Scale (Jessica Hullman, February 12th)
Compared to previous years, there were relatively few great posts about data visualization that I saw in 2021 (either relating to theory or practice). My favorite link out of this bunch was Hullman’s, by a wide margin. I used to teach data visualization, and ‘what are the measurable outcomes of good/bad visualizations’ has always been a tricky one for me to internalize.
Hullman links to a blog post about a study (yes, tertiary source) about the impact of using log scales during COVID-19 (when everyone was looking at graphs every day).
[W]hen people are exposed to a logarithmic scale they have a less accurate understanding of how the pandemic unfolded until now, make less accurate predictions on its future, and have different policy preferences than when they are exposed to a linear scale.
This hasn’t come up in life yet, but I’m keeping it handy next time some fellow viz nerd tells me that a log scale is more appropriate for a presentation!
Taking Criticism While Privileged (Pamela Oliver, April 16)
This piece (and the full essay it’s adapted from) was, without a doubt, the piece of writing I had the strongest emotional reaction to this year. I know it’s a bit older, but I don’t think I could put together a list of 2021 favorites without including it. Written for academia, I think it’s pertinent for anyone who manages and strives to build a culture that values open dialogue with their team and throughout their company.
How should you respond when a student says something you said or did was domineering, insensitive, racist or sexist? What if you think that criticism is unfair or inappropriate? What if you think the critic has a good point and you feel bad about it?
This piece resonated with me in a way the more popular texts about the daily application of privilege and have not. Since reading this, I think I’ve become more appreciative and understanding of the things I can’t change and the agreements I can’t broker, and surprisingly I believe it has made me more effective and better to work with.
Churn is Hard (Randy Au, April 16)
I’ve really enjoyed Randy’s Substack this year: one of my go-tos when I’m looking for a NYC-area post and I haven’t seen anything over the course of the week.
There’s no silver bullets, just grind
This post (and one of the twitter threads it references) is terrific in talking about the crux of big conceptual problems like churn, which is identifying the group of people you want to understand better. Your userbase is likely to be deeply heterogeneous, and even products with relatively simple interfaces might support many different usage patterns. So understanding patterns of user action requires a lot of work to segment out all the users who interact with your product in a different way.
Finding Structure in Users’ Evolving Listening Preferences (Passino et al, April 30)
I’ve often wondered how we can understand ‘growing up’ online. So much of the time, when we talk about people interacting with our websites or products, we think of them as some sort of fixed ‘persona’, while in our regular lives we experience changing interests and constraints all the time.
[T]o better serve users in the long run we need to understand how their long-term preferences evolve over time.
This piece (based on the paper) tackles this via transition matrices. Since music comes with a lot of cultural context, even without a real understanding of how Spotify works I can see some patterns that seem intuitive to me (soul -> motown -> rock) and some that don’t (house -> country?). I do wish the paper had covered a time period longer than 6 months, though.
There was also some fun trivia in the piece. The authors used 4,430 musical genres! (you can see them here: they’re now up to 5,701 as I write this).
Potemkin Data Science (Michael Correll, June 14)
I had a conversation with a friend who was shaken to the core reading this piece. The feeling that the work we do doesn’t matter is a deeply disquieting thought. Data Science certainly doesn’t have a monopoly on this, but I know many people who work in analytical roles worry about whether or not the people who make decisions are open to changing their minds based on ‘what the numbers say’.
[A] lot of the dashboards we were seeing appeared to be for “decision-laundering:” justifying stuff that had already been decided at levels above us.
I really enjoy Correll’s writing here, but my biggest gripe is the passivity of the data actors in his scenarios. The lack of buy-in or support for making decisions comes from two-dimensional executives who come across as either cynics or dilettantes. Where is the Data Science Manager or Executive in this, understanding the constraints of their partners and identifying the right ways for the teams to work together? Where is the trust-building and the attempt to measure the utility of these data insights?
Building a Data Team at a Mid-Stage Startup: A Short Story (Erik Bernhardsson, July 12)
I have really enjoyed Bernhardsson’s thought experiments, and this is my favorite of the set he’s written over the last few years. People who work in analytical roles talk a lot about organizational structure, but this is the first piece I’d seen that really dealt with the experience of building out a team, and the tactical constraints and decision-making that go into it.
The backdrop is: you have been brought in to grow a tiny data team (~4 people) at a mid-stage startup (~$10M annual revenue), although this story could take place at many different types of companies.
For me, a lot of the story rang true, and parts felt eerily familiar. I’ve seen the tension between ‘I want to work on the cool stuff’ and ‘I need someone to do that stuff that that business needs’ can be really stark if expectations for roles are not set up correctly. I’ve also seen the road to data products appear after digging out from deep organizational and technical debt.
The pattern of ‘Data Team as expensive QA’ comes up in this piece; I see this as the most gratifying way that data teams have added value to product org. Being able to say ‘you have this issue in pattern/flow you aren’t testing well’, and seeing the numbers respond immediately when the issue is fixed can feel like magic.
The Untold Story of SQLite (Adam Gordon Bell & Richard Hipp July 12)
I love love loved this piece! From the genesis of the idea (on a battleship!) to early views of the smartphone market, to the testing coverage work, (I had actually read about the project's code coverage before, but it's breathtaking) it's all just mind-blowing.
Adam: 100,000 distinct test cases, and then they’re parametrized, so then, how many …
Richard: Yes, so we’ll do billions of tests.
Adam: Oh, wow.
Richard: Yeah. We have a check list and we will run tests for at least three days prior to a release.
I’m partial to hearing from old-timers about how the critical infrastructure we rely on every day is built and maintained. This was the best thing from that genre I’ve read in a bit.
Richard Hipp does come across as a bit… eccentric here at times (your own VCS? Your own mail server?) but he’s funny and thoughtful, and it makes for a great conversation and transcript.
Pseudo-R²: A Metric for Quantifying Interestingness (David Robinson, September 2)
My favorite quantitative piece this year introduced me to McFadden’s pseudo-R², which I wasn’t familiar with before. The idea of using this as an automated sweep through different potential cuts of data seems like it could be really helpful with a first pass of ‘data-mining’ large datasets and pointing to areas that deserve further, manual investigation.
Pseudo-R² thus balances the variation in groups with the composition: it rewards groupings where there are common categories that have unusually high or low success rates.
Robinson’s writing on the topic is very clear, despite the technical subject matter, and the graphs and examples are all really well conceived. Overall, I came away from this really impressed.
The Missing Analytics Executive (Benn Stancil, December 10)
Out of all the posts that I mentioned in my newsletter this year, this one got the most responses from my friends and peers.
Benn really nailed the experience of running a team and its discontents. Whether you’re in ‘the room where it happens’ or not, most of us haven’t figured out how to make our work consequential to our organizations the way Engineering & Marketing are.
Without the gravity of a large organization underneath them, these data executives play bit parts, pushed to the back of the board deck and relegated to the always-too-high G&A budget, an administrative asterisk next to the departments that are seen as making real products or real money.
It’s not clear to me this is a big problem, for what it’s worth. Sure, it’s a problem for me, and the people Benn is talking about, granted. But I don’t hear a lot of HR or security leaders gnashing teeth and rending cloth because their teams aren’t ‘First Class Citizens’ (a concept I get asked about constantly and dislike immensely) at their company. There’s a whiff of diva (from the theoretical leader, not from the author) in the construction of the problem, to be candid. Data is important, really important, but a lot of things are really important.
Regardless, the question of ‘how to best serve an executive team with analytical support’ is a great one, and I’m glad Benn posed it. I understand the idea of the 'very senior IC in the leadership room'. I talked with the person I know who most closely fit that mold, and they had mixed feelings about this but couldn’t really articulate a better idea, and I think that’s where I am at the moment as well.
So that’s it! Here are some honorable mentions:
The Hard Lessons of Modeling the Coronavirus Pandemic (February 19)
On power markets, snow storms, and $16,000 power bills (February 26)
Democracy's Data Infrastructure (April 2)
How We Manage New York Times Readers’ Data Privacy (May 21)
Finding Clusters of NYPD Officers in CCRB Complaint Data (July 6)
Zillow, Prophet, Time Series, & Prices (November 8)
Estimating the Geographic Area of a Real Estate Agent (November 22)
Also, here are my favorite bits of trivia that I uncovered this year.
Your body burns around 1/250th of a calorie warming up the frozen air in each breath that you take (January 22)
This study on car seats as contraception (March 12):
We estimate that [car seat] laws prevented only 57 car crash fatalities of children nationwide in 2017. Simultaneously, they led to a permanent reduction of approximately 8,000 births in the same year,
The names of Donald Duck's nephews in various European languages (March 19)
The Governor of Utah offered a mea culpa for the state incorrectly calculating its vaccination stats. Counting stuff remains surprisingly difficult! (July 19)
And this is the dumbest game I wasted hours on: Iceberger (February 26)
Thanks so much for reading, and for being a newsletter subscriber (if you are one: if not, I’m amazed you made it this far and you can subscribe here). Happy Holidays, and best wishes to you and yours in 2022.