Topics

notes R shiny predictedit databricks aws

Using shinytest with htmlwidgets DataTables

What is shinytest and Why Should You Care?

At work, I am part of a team that develops and maintains in-house R packages and shiny apps for distributed energy modeling. The ability to easily develop and iterate the modeling code alongside the web app– full-stack analytics development, all in R– allows us to move fast to support our sales engineers as they help create the growing, distributed energy industry.

As our tools have become more business critical, we need to demonstrate that we’re building “production” software. This has always been a chip on the shoulder of every R developer, and is made worse by historic projects at my company, however the R community, strongly led by RStudio, is continuously pushing back on this image with tools designed for robustness. Production software is a series of best-practices and processes, and is not a specific feature of a language.

The shinytest package is a project in active development led by the excellent team at RStudio, with the heaviest contribution from Winston Chang and Gábor Csárdi. It provides a means for automating functional tests for a shiny app and running them with a headless browser (specifically PhantomJS). By adopting automated testing, a developer can feel more confident they are releasing a high-quality, defect-free application.

Read More

Custom R Packages on Databricks

Problem Overview

The Databricks platform provides a great solution for data wonks to write polyglot notebooks that leverage tools like Python, R, and most-importantly Spark. It is easy to experiment in a notebook and then scale it up to a solution that is more production-ready, leveraging features like scheduled, AWS clusters.

In my case, I need to use an ecosystem of custom, in-house R packages, hosted on our internal GitHub Enterprise server, to interact with various internal services. Databricks allows users to manage packages using Libraries, but currently only R packages that are hosted on a CRAN server can be installed.

In this post I will go through my process for POSTing a custom R package to the Databricks File System (dbfs) and installing it on each node of a cluster using a Cluster Node Initialization Script (init script).

Read More

AWS Serverless Application Model (SAM)

Overview

Use a SAM template defined in yaml or json to define the CloudFormation stack for your serverless application. AWS CloudFormation can interpret the SAM definition and deploy the application automatically.

Really, SAM is a CloudFormation template and the SAM-specific sections get interpreted based upon the specified transform AWS::Serverless-2016-10-31.

Read More

Independent Project - PredictedIt - Project Notes

Project Overview

I have a personal interest in politics as well as economics. I find I am especially intrigued by the idea of markets to predict outcomes, and in the past I’ve played around with placing political bets on the website PredictIt.org.

I would love to be able to practice my data analysis skills on the PredictIt market data, and even learn a bit about quantitative trading without wading into the rats nest that is the stock market. Unfortunately, the PredictIt API is pretty minimal, so for this project I am setting out to archive data from PredictIt using the cheapest, abstracted-AWS services I can find; structure the data and make it accessible to others in a cheap, but scalable, microservices architecture; Analyze the data myself and potentially practice applying some quant-trading algorithms along the way.

Read More