NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

Review To sum up the last post, our driver’s RAM was essentially the bottleneck and what was causing our Spark application and the underlying JVM to crash. Before, we were using 3 AWS m4.large (8GB RAM) boxes for our master + 2 workers. In this notebook, I’ve spawned a new cluster keeping my workers the […]

Read More NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

NYPD Crime #5 – Apache Spark

Intro Oh man… SPARK… Where to start. First of all, pretty much everything in this post will be based on this seminar by Sameer Farooqi from Databricks. It’s a full-day seminar, and yes, I watched all 6 hours. Me after watching the video: Me realizing where my day has gone: Hey, this is the life […]

Read More NYPD Crime #5 – Apache Spark