Mapping the US Banking System with D3.js

Screen Shot 2013-12-30 at 6.25.02 PM

I snuck in one last data project for 2013: a d3 interactive map of the US banking system. You can play with it here. Please do and let me know what you think!

I was really happy with how the project came out. I also was really happy with Github Pages, which I tried out for the first time with this project. If you ever work with git, I can’t recommend it enough. I’m hoping to move more of my projects there in the future.

 

A Startup’s Minimum Revenue Per Employee

Screen Shot 2013-10-07 at 11.06.09 PM

In a couple of weeks I am going to be teaching a class on Data Visualization for Businesses (you should come!) and as part of the class prep I started thinking about key metrics that my students may want to visualize.

After weighing some of the options, I settled on Revenue Per Employee, which has been on my mind recently. I want to understand what is the minimum revenue per employee that quickly growing companies can sustain? Continue reading

Why Detroit’s Bankruptcy is a Bigger Deal Than You Think

(disclosure- I am involved with and invested in Lumesis, which sells Muni credit & compliance software)

detroit3So Detroit filed for bankruptcy last week.

If you live in the US and don’t work in municipal finance, you probably barely registered this news.

And you probably think that it’s not that surprising that something bad happened in Detroit. I mean, during the recession a house there cost less than a car. So we all saw this coming, right?

Well, yes and no.

Yes, everyone in the muni market knew that Detroit was a poor credit. It’s not a surprise- that’s not why it’s a big deal. Continue reading

Analyzing ‘Analyzing the Analyzers’: an Analysis

Analyzing the Analyzers is a recently published report by Harlan Harris, Sean Patrick Murphy and Marck Vaisman, documenting the results of a 2012 survey of ‘several hundred’ data scientists.

The report is free and just 25 pages of text, plus an appendix- you should read it.

The authors’ central contention is that there is not one set of skills that organizations should look for in a data scientist. Instead, there are four distinct skill groupings that you will find in the ‘data science’ world:

-          Data Businesspeople: managers primarily focused on their organization and the bottom line

-          Data Creatives: hackers who feel comfortable with the entire data pipeline, from extraction to presentation

-          Data Developers: back-end and infrastructure engineers primarily working on data extraction, storage, and scale issues

-          Data Researchers: academics, usually with a strong background in statistics

(if you are interested in the skill sets considered and  how you would be categorized, you can check out the authors’ quick survey). Continue reading

The Dashboard Lifecycle

I once asked my brother, who studied large organizations, what was more effective- the hierarchical, top-down organization of, say, Apple, or the distributed decision-making of, say, Urban Outfitters.

My brother said “both”.

Apparently, the best way to capture the benefits of hierarchies (order, coordination) and delegated authority (reaction speed, creativity) was to cycle between the two. There was generally no single best system for any one organization, not even for very large organizations with stable missions.

Change was best. Even though it imposes high switching costs, change is best.

That conversation occurred to me this week as I looked at the dashboard that I provide my team, updating them on the state of our business. My dashboards generally shift from being very simple to being much more complex, until we all agree it’s time for a different look and we burn them down again.

Now I’m starting over again with a new dashboard, and I’ve realized that this process has repeated itself enough that I really recognize a cycle, which I’m calling, super-creatively, ‘The Dashboard Life-Cycle’. It goes like this:

Creation: It starts very simply. “What are our three top priorities or KPIs and how are we measuring them?” A first cut of a dashboard might be as simple as 3-5 numbers, tracked over time. People look at it and say “That will do for now, I guess”. I always think of the dashing demo as a fine example of a dashboard in this stage.

Continue reading

The Real Issue With Excel

It has been over a week since we discovered that Carmen Reinhart and Kenneth Rogoff made some unusual modeling choices and an error in key calculations in their seminal paper Growth in a Time of Debt.

So far, Microsoft Excel has taken a lot of the criticism, with the damning image above used to prove the point. The press has had no problem coming up with other examples of Excel errors resulting in serious costs. Though I do think Ars Technica takes the cake with this image:

 

I’ve spent plenty of time working on financial models in Excel. In fact, I helped write a book about it. And yes, before you ask, there are (a small number of) errors in the book. My co-author keeps a list of errata for all of his books accessible online, so feel free to check it out and make fun.

However, I now work primarily in R and Python, using each both accounting/forecasting purposes (as well as a bunch of other things, of course). I’ve even worked with SAS on one painful occasion. So I have at least a little perspective to opine on Excel and alternatives. Continue reading

$1 Trillion of Student Loan Debt

Today I learned that student loans recently surpassed credit-cards to become the largest non-mortgage source of debt for US consumers. Not only that, in the next quarter or two, total outstanding debt will ominously hit 1 Trillion Dollars.

image

(wikipedia)

The remarkable thing about this is not just how big student loan debt is, but how much it has grown in the last few years.

image

(chart shamelessly stolen from alphaville)

Two things jump out at me:

Continue reading

Proof that the Cosby Show is the Greatest TV Show of All Time, and How TV and the Web are Trading Places

Recently I took a look at the Wikipedia page for Nielsen ratings and noticed that they list out the most popular show for each year. The most popular show is getting less and less popular every year.

image

The top-rated show last year, Monday Night Football, had a rating of 12.9, which according to Nielsen means that 12.9% of US TV households watched, on average. The top rated show 28 years before, Dallas, had a rating of more than twice that, at 25.7. 28 years before that, I Love Lucy had a rating of about 50 (and an all-time high over 60 two years before that).

It looks like the Nielsen ratings of the top US show are roughly following an exponential decay curve. I did a quick fit using the nls function in R.

image

This gets a bit of a statistics caveat* but overall I feel like this is a pretty good sense of how ratings are trending. Based on the trend, by 2029, no show will earn a 10 rating, though frankly that might happen sooner. And if you really want to stretch the limit of time (and credibility) by 2100, no show will be watched by more than 2% of TV-watching households.

Below are some indications of which TV shows had the largest residuals: you can think of these as being the most off-trend. As you’ll see, the sweater reigns supreme when we correct for the era (using residuals in log points)

Continue reading