The Limits of Data: Weighted Edition

One tip for aspiring data hounds — always ask for weighted data.

What do I mean?

What’s the on-time performance for commuter rail?

In NY, it’s 96%! Huzzah, right?

Wrong, that’s on the basis of trains, not weighted by passenger. Anecdotal evidence suggests rush-hour trains (which are packed with people) have a much higher rate of delay.

By not weighting on-time data by passenger, NY’s on-time performance fails to tell the whole story and fails the credibility test with the most important user group of all — the customers.