Project: A Crime Analysis of the Last Decade NYC
Big Data Project with Apache Spark + Amazon EMR
The followings are slides & written report for the Big Data Seminar course. It is a project written with Spark, deployed first on DataBricks and then on Amazon EMR. The packages involved: SparkML developed by Apache Spark team, and Azure Machine Learning developed by Microsoft.
I used the community version DataBricks, and the EMR costed around $5 (paid by school). If you are interested in replicating the result yourself, feel free to take my code from the following links:
PySpark Code for Visualization and Exploratory Analysis
PySpark Code for Feature Engineering and Modeling
Project Slides
Project Report
PreviousProject: The Winning Recipes to an Oscar AwardNextProject: Predict User Type Based on Citibike Data
Last updated