NYPD Crime #5 – Apache Spark

Intro Oh man… SPARK… Where to start. First of all, pretty much everything in this post will be based on this seminar by Sameer Farooqi from Databricks. It’s a full-day seminar, and yes, I watched all 6 hours. Me after watching the video: Me realizing where my day has gone: Hey, this is the life […]

Read More NYPD Crime #5 – Apache Spark

NYPD Crime #2 – Distributed Computing & Amazon EMR

Distributed Computing Before we jump straight into EMR (Elastic Map Reduce (The “Map Reduce” refers to essentially a distributed computing computational engine that is more or less obsolete these days)) or Hadoop, let’s just spend a few minutes (well, paragraphs) talking about distributed computing in general. Distributed computing is the idea of distributing your computing […]

Read More NYPD Crime #2 – Distributed Computing & Amazon EMR

NYPD Crime #1 – Intro

Yello Hello! I’m back for another random data science project to try, fail, and sometimes succeed at! Key word… probably “fail”. All good, though, I’ve failed so many times now that it’s basically become second nature! In the last post, I tried to build a Convolutional Neural Network to identify my girlfriend’s face from mine. […]

Read More NYPD Crime #1 – Intro