Jump to content

Providing independent coverage of the Minnesota Wild.
Subscribe to Wild Xtra Email

The Forums


Amateur Hockey Talk Yesterday, 06:33 PM
http://sirc.ca/caree...ibune-company-0 http://sirc.ca/caree...tribune-company

99th Memorial Cup Final Spitfires Otters Revalry

Amateur Hockey Talk Yesterday, 05:02 PM
Otters vs Spitfires Live stream

99th Memorial Cup Spitfires Otters Revalry

Amateur Hockey Talk Yesterday, 05:02 PM
Spitfires vs Otters Live streamOtters vs Spitfires Live streamMemorial Cup Final 2017 Live stream

State of Origin 2017 NSW v QLD revalry

Amateur Hockey Talk Yesterday, 05:02 PM
State of Origin 2017 Live stream

The Airline Is Liable to Reimburse

Amateur Hockey Talk Yesterday, 08:18 AM
England vs Barbarians Live Stream

The MN Wild Zone Entry/Exit Project, Hubris Personified

Posted by hvs , 13 December 2014 · 2,131 views

The best laid schemes of mice and men
Go often askew,
And leaves us nothing but grief and pain,
For promised joy!

- "To a Mouse" by Robert Burns (1785)

At the beginning of this season, I had a plan to track all of the zone entries and exits for the Wild for the entire season and make that data available to the public. The idea was that it would improve analysis of our favorite team and fill a gap not being addressed anywhere else. While it was a noble goal, I did not know at the time that it was also incredibly unrealistic.

It became apparent almost immediately that it would require almost twice as much time as a normal game in order to track all of the events in the level of detail that I was attempting. The reason is that it is often necessary to stop, rewind, and replay events multiple times in order to ensure that you have assigned the event to the correct player as well as the right type of event. Complicating this process even more is that events are often not as black and white as we would like to believe. Dumping, carrying, passing, and chipping into the zone all become points on a spectrum rather than discrete events as you begin collecting more and more data. Add to this the inexorable pressure of time, other commitments, and sheer boredom and you have a recipe for failure.

That said, the project was not a complete and utter failure (just very close to one). I came up with a way of tracking events in a game that I think is fairly easy to use for both tracking games manually as well as processing with analytical tools; I have a better understanding of what is required for tracking these events, which may come into play in future projects I am considering to make this tracking easier to accomplish; and I learned that Corey Sznajder (of the Carolina Hurricane's blog, Shutdown Line, and now employee of an unnamed NHL team) is an alien from another planet, after tracking all games for the 2013-14 season.

A Format for Tracking Hockey Events

So, what does this format for tracking events look like? It's very straightforward and requires nothing more than a text editor to create and update (no Excel/Google Spreadsheets/etc required). It also allows for additions to the format as new events, or changes to existing events, are needed. Here is an example:


Right now you are thinking, "Wait, I thought you said this was going to be simple?" It is, once you know what you are looking at.

Keep in mind that the format needs to be as minimal as possible. When you are watching these games and there is an event roughly every five seconds, the last thing you want to do is spend 15-20 seconds arrow-keying through an Excel file. With approximately 400 events in every game, every additional second you spend on an event means an additional 6-7 minutes tracking the game. So if you pause at an event and spend 15 seconds tracking it, that means you are going to be tracking 100 minutes/game, or over an hour and a half. That isn't including the time you are actually watching the game. It is also tricky to fast forward to events as they happen so often and quickly (plus, if you are using NHL GameCenter Live rather than a DVR, your precision at hitting events exactly will be hindered by the buffering of the stream). This is why the format is compact. It is meant to limit the time you spend tracking as much as possible.

The file starts with a version number (in this case "v1"). This allows tools to know what type of file this is. If changes are made to the format in the future, they can be handled appropriately while remaining backwards-compatible.

Next is a block of basic game data. Listed are the home and visiting teams as well as the date of the game. I used team codes from Hockey-Reference.com because, well, I manage that site and I think they make a lot of sense.

Then you get to the meat of the file, the event data. This should be read as a stream of events occurring over time.

The period is indicated with a "p". So, when you see "p,1" that means "period 1 begins," when you see "p,2" that means "period 2 begins", etc.

"20:00,str,5v5" indicates the team strengths have changed at the given time. Obviously, at the beginning of any game, the strength is 5-on-5. The strengths are indicated as visitor vs. home, so "18:34,str,4v5" would indicate that at minute 1:26 (the time counts down to make it easy to track from the game video) of the current period the visiting team was down a man to the home team. This strength will continue to apply to all events after this line until a new "str" line occurs in the file.

Now we get to the events that we really care about, zone entries and exits. Zone entries are indicated by events that begin with an "e", while zone exits are indicated by an "x". They are both followed by the team that caused the event, either an "h" for home or a "v" for visitor. This is followed by the jersey number of the player on the team. For instance, the code "h20" in this game would indicate Ryan Suter. Next, they are followed by a "type code". This is different for entries and exits and is the component most likely to change in the future. While I was tracking games I came up with the following codes:

Screenshot 2014 12 13 13.27.25

Finally, the "+" at the end of an event indicates that the event occurred "under pressure." This is a highly subjective designation, but it attempts to indicate if an entry or exit was contentious (e.g. an opposing player was pressuring) or if they were skating generally unopposed.


So, the MN Wild tracking project was ultimately a failure, largely due to my naiveté at the amount of work required to do it. However, it was a useful project from my perspective and will ultimately guide any future projects that I do with the Wild.

Update (2014-12-16):

Since I got a couple requests for a way to convert this format into something useful, I've made my Python scripts for processing these files available on Github: https://github.com/hansvs/pyhockey.

You can follow me on Twitter at @cantpitch.