Dev Talk: Managing Racing Data with Terraform and AWS

All Articles Culture Data Management Level 12 News Python Software Development Testing

Racing with Data

Racing is big in Kentucky. Whether you are into NASCAR or horse racing, we can accommodate.

One project we are working on cashes in on this sector with a data driven betting tool.

Every day there are races happening all over the world at different times and people place wagers on all of them.

If you are into race wagers, there may be a multitude of different reasons why you like one competitor over another, and you get your competitive information from several publicly available data sources.

However, you cannot bet on every race whenever you want, you can only place wagers at appointed times. These times may move up or back during the day depending on the efficiency of the track.

To make wagering easier, and to generate automatic updates on races we built a bi-directional communication system that automatically checks the authoritative data sources for the racing events every 15-30 seconds and allows our client to place wagers at the appropriate time. Once we have received the data, we store it in AWS data centers and use Terraform to scale up or down our cloud computing needs automatically.

In this particular project we are taking multiple streams of data from several different sources. Maybe it is data about who is participating in the race, or how many people are in the race.

Our system takes is constantly grabbing an updating data in our operating database (PostgreSQL) and structuring the data in a consistent way so that our client can see they always have the most recent information.

Enabling Data Analytics

There is a joke in Louisville, home of the Kentucky Derby (I actually heard this at Churchill Downs). How do you become a millionaire in horse racing? Start out a multi-millionaire :)

Future Millionaire on Board (Update)

Having data is great, but the real key is analyzing it so you can predict future results. Ultimately, this is what our client is doing with our data. They've built tools that allow them to analyze how the heats, how tracks, and races are actually going to play out. For instance, if a particular dirt track is consistently 10 minutes behind schedule of when the race is actually supposed to happen, we ship all that data to them and constantly update their data so they can say on average how far behind is this track, or on average which position at each track tends to win. A large part of this project has been to facilitate moving all that data from the operations table into our ETL (extract transform load) tool for analytics.

Data Analytics Tables v Operational Tables

The analytics tables are structured very differently than the operational tables so that we can query a large data set and aggregate the results. Analytics tables
  • Needs more storage
  • More expensive to run
  • Faster at fetching information
  • Less normalization (allowing for faster analysis on big data sets)
  • For analyzing a large data set
Operational tables
  • Takes less storage
  • Less expensive to run
  • Slower to fetch information as there is more data attached
  • More data normalization
  • Provides more data on each individual record in a database
  • For inspecting individual records

Ultimately the trade-off is between storage size (how much is it going to cost you to store the data) versus how much is it going to cost you to calculate and retrieve the data.

In most cases, in an analytics world, we care less about storage costs, and we care more about the speed of fetching that information. To use a racing example, if we're if we're talking about a particular car in the race, we have to make sure that "cars" actually exists. In our analytics table, we don't really care that much about the existence of "cars," whereas in operations tables we care a lot. The thing actually has to exist because in an operational database "cars" has to exist in order for the records concerning cars to exist.

Resource management with Terraform and AWS

So we have our data and databases, but these all have to live somewhere.

For this particular project we decided to use the Amazon cloud as our server so that we would have the flexibility to quickly scale up and down our resources.

With our racing app, there are times where lots of races are happening and need to have data processed, and there are times little is happening, so we need to be able to scale on demand, and automatically.

One way of spinning up new servers in times of need is to have an engineer request AWS to spin up or down another server.

You can log in to their console and click around and say, "create a Virtual Machine for me," and then they'll walk you through this little cool little wizard to make your changes.

This is okay if you're spinning up one server, but in this project we have almost a dozen, and they all basically do the same thing.

Terraform allows us to automatically spin up and down servers by creating an intermediary data file that allows us to declare what we want our development environment to be.

Terraform will query AWS and say, "oh, it looks like that server exists, so no changes are needed," or "the server you want does not exist, so I will make it."

By using AWS and Terraform for our cloud infrastructure our client gets the computing power they need at the right time automatically and don't have to pay for the extra servers when they are not in use.

If you need help building or managing an app with AWS and Terraform, please contact us.

If you want to learn more about work we have done building and managing databases, take a look at our blog on a recent project migrating data with PostgreSQL and SQLServer databases.

 

 

Originally published on 2020-08-10 by Royce Hall

Reach out to us to discuss your complex deployment needs (or to chat about Star Trek)