kt analytics

Another Data Scraper

November 20, 2023

I’ve always been passionate about sports, but I never went into the stats, eyeballing them week to week for fantasy, but I never really spent time analyzing the data as I would professionally.

With the proliferation of sports gambling apps and websites supporting the industry, I thought I’d dive into the data, grade some professional gamblers picks, and learn some new tools along the way.

Deciding to start with football and basketball, I had already been using GCP for GA4 data in BigQuery, and now I wanted to practice building out a data lake and warehouse, with Looker Studio as a visualization tool. Maybe building an application on top to allow more features than Looker Studio can provide.

What I’ve decided to do.

APIs first - then save the response - then format and save as table

Within GCP, I’m using cloud functions to run Python scripts to scrape websites and APIs, saving the raw responses as JSON files for backup and in the same process starting some transformation while converting the data into tabular format for storage in CSVs. The files are kept in cloud storage, with BigQuery external tables built on top of the CSV data.

This is where I introduce DBT to perform the last of the required ELT/ETL (I’d call this a hybrid approach) for use in Looker Studio. DBT allows me to complete the transformations, and then create views or tables as necessary.

Everything in GCP has been easy to configure. Debugging some of my python code was the most time consuming part but also easier than ever with ChatGPT. I knew that developing the entire process on my own would be a lot of trial and error. Taking time to build familiarity with the raw datasets and how I’d be transforming the data. Great to get a hands-on refresher and think of different use cases.

More updates to come as the project moves along!


Aris

Written by Aris Explore when and wherever you can! You should follow them on Twitter