NYPD Crime #19 – Clustering To Explore Neighbourhoods (Part IV – Continued Because Spark Hates Me)

Review I ran into a major dead end in the last post. The problem? Data pre-processing… You don’t often think that data processing would be the activity that prevents you from moving forward eh? As a novice data scientist, you’re so infatuated with the high level objective, the meat of the analysis, the sexy chart […]

Read More NYPD Crime #19 – Clustering To Explore Neighbourhoods (Part IV – Continued Because Spark Hates Me)

NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

Review To sum up the last post, our driver’s RAM was essentially the bottleneck and what was causing our Spark application and the underlying JVM to crash. Before, we were using 3 AWS m4.large (8GB RAM) boxes for our master + 2 workers. In this notebook, I’ve spawned a new cluster keeping my workers the […]

Read More NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)