Topics

notes R shiny predictedit databricks aws

Independent Project - PredictedIt - Project Notes

[ predictedit  notes  ]

Project Overview

I have a personal interest in politics as well as economics. I find I am especially intrigued by the idea of markets to predict outcomes, and in the past I’ve played around with placing political bets on the website PredictIt.org.

I would love to be able to practice my data analysis skills on the PredictIt market data, and even learn a bit about quantitative trading without wading into the rats nest that is the stock market. Unfortunately, the PredictIt API is pretty minimal, so for this project I am setting out to archive data from PredictIt using the cheapest, abstracted-AWS services I can find; structure the data and make it accessible to others in a cheap, but scalable, microservices architecture; Analyze the data myself and potentially practice applying some quant-trading algorithms along the way.

Project Goals

  • Learn how to build and release an AWS-based architecture
  • Practice data acquisition, cleaning, and structuring
  • Practice data analysis
  • Learn a bit about common quantitative training techniques
  • Archive PredictIt pricing data and make it easily accessible for others

Requirements

  • Minimize costs
  • Limit cloud resources to AWS because:
    • We are an AWS shop at work, so this aids professional development
    • I am planning to take the AWS Certified Developer - Associate Certification Exam
  • Make the data I archive easily accessible and ensure access is scalable
  • Follow data fault-tolerance and security-best practices

POC Architecture:

TODO(nadir.sidi): Add a lucid-chart diagram here

ToDo:

Data Acquisition

  • Review the existing PredictIt API endpoints using Postman
  • Write a Python Lambda function to hit the API and stash the data
  • Set-Up a deployment pipeline
    • Set-up AWS CodeBuild with hooks to Github repo with Lambda
    • Learn about CloudWatch SAM files to handle deployment from S3 artifact

Data Storage

  • Determine initial partition-scheme for stashing PredictIt data
  • Investigate transforming the data to store into a DB
  • Write a Athena query to read data directly from S3
    • Is is bad practice to write a API that hits the Athena schema-on-read directly?

Data Access

  • Investigate querying data from Athenta

Topics

notes

R

shiny

predictedit

databricks

aws

Written on October 1, 2018