Independent Project - PredictedIt - Project Notes

[ predictedit notes ]

Project Overview

I have a personal interest in politics as well as economics. I find I am especially intrigued by the idea of markets to predict outcomes, and in the past I’ve played around with placing political bets on the website PredictIt.org.

I would love to be able to practice my data analysis skills on the PredictIt market data, and even learn a bit about quantitative trading without wading into the rats nest that is the stock market. Unfortunately, the PredictIt API is pretty minimal, so for this project I am setting out to archive data from PredictIt using the cheapest, abstracted-AWS services I can find; structure the data and make it accessible to others in a cheap, but scalable, microservices architecture; Analyze the data myself and potentially practice applying some quant-trading algorithms along the way.

Project Goals

Learn how to build and release an AWS-based architecture
Practice data acquisition, cleaning, and structuring
Practice data analysis
Learn a bit about common quantitative training techniques
Archive PredictIt pricing data and make it easily accessible for others

Requirements

Minimize costs
Limit cloud resources to AWS because:
- We are an AWS shop at work, so this aids professional development
- I am planning to take the AWS Certified Developer - Associate Certification Exam
Make the data I archive easily accessible and ensure access is scalable
Follow data fault-tolerance and security-best practices

POC Architecture:

TODO(nadir.sidi): Add a lucid-chart diagram here

ToDo:

Data Acquisition

Review the existing PredictIt API endpoints using Postman
Write a Python Lambda function to hit the API and stash the data
Set-Up a deployment pipeline
- Set-up AWS CodeBuild with hooks to Github repo with Lambda
- Learn about CloudWatch SAM files to handle deployment from S3 artifact

Data Storage

Determine initial partition-scheme for stashing PredictIt data
Investigate transforming the data to store into a DB
Write a Athena query to read data directly from S3
- Is is bad practice to write a API that hits the Athena schema-on-read directly?

Data Access

Investigate querying data from Athenta

Topics

Written on October 1, 2018