NYPD Crime #20 – Conclusion

What A Journey It basically took me 20 posts to get come to some conclusions and then kind of contradict them. In terms of the data set itself, I definitely found some interesting insights, but my understanding of the topic kinda followed this trajectory: In [1]: # I’m starting to think the single graph this setup […]

Read More NYPD Crime #20 – Conclusion

NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

Review To sum up the last post, our driver’s RAM was essentially the bottleneck and what was causing our Spark application and the underlying JVM to crash. Before, we were using 3 AWS m4.large (8GB RAM) boxes for our master + 2 workers. In this notebook, I’ve spawned a new cluster keeping my workers the […]

Read More NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

NYPD Crime #2 – Distributed Computing & Amazon EMR

Distributed Computing Before we jump straight into EMR (Elastic Map Reduce (The “Map Reduce” refers to essentially a distributed computing computational engine that is more or less obsolete these days)) or Hadoop, let’s just spend a few minutes (well, paragraphs) talking about distributed computing in general. Distributed computing is the idea of distributing your computing […]

Read More NYPD Crime #2 – Distributed Computing & Amazon EMR

NYPD Crime #1 – Intro

Yello Hello! I’m back for another random data science project to try, fail, and sometimes succeed at! Key word… probably “fail”. All good, though, I’ve failed so many times now that it’s basically become second nature! In the last post, I tried to build a Convolutional Neural Network to identify my girlfriend’s face from mine. […]

Read More NYPD Crime #1 – Intro