NYPD Crime #20 – Conclusion

What A Journey It basically took me 20 posts to get come to some conclusions and then kind of contradict them. In terms of the data set itself, I definitely found some interesting insights, but my understanding of the topic kinda followed this trajectory: InĀ [1]: # I’m starting to think the single graph this setup […]

Read More NYPD Crime #20 – Conclusion

NYPD Crime #19 – Clustering To Explore Neighbourhoods (Part IV – Continued Because Spark Hates Me)

Review I ran into a major dead end in the last post. The problem? Data pre-processing… You don’t often think that data processing would be the activity that prevents you from moving forward eh? As a novice data scientist, you’re so infatuated with the high level objective, the meat of the analysis, the sexy chart […]

Read More NYPD Crime #19 – Clustering To Explore Neighbourhoods (Part IV – Continued Because Spark Hates Me)

NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

Review To sum up the last post, our driver’s RAM was essentially the bottleneck and what was causing our Spark application and the underlying JVM to crash. Before, we were using 3 AWS m4.large (8GB RAM) boxes for our master + 2 workers. In this notebook, I’ve spawned a new cluster keeping my workers the […]

Read More NYPD Crime #18 – Clustering To Explore Neighbourhoods (Part III – Continued Because Spark Hates Me)

NYPD Crime #15 – Takeaways

Themes, yes. What we’ve all been waiting for in this post after 14 posts of mostly troubleshooting haha. To review some of the plots we created with datashader: Figure 1: NYC Crime Density Figure 2: NYC Crime Density (Top 3% Most Dense Regions Highlighted) Figure 3: NYC Crime Density by Offense Level Figure 4: NYC […]

Read More NYPD Crime #15 – Takeaways

NYPD Crime #14 – Data Exploration (Part VIIII – Lat Long Visualization By Category)

Review This is a direct continuation of last post, so I won’t review too much. Last time, we mapped NYPD complaint / offense density geographically using datashader. We explored the ideas of color gradients, color gradient floors, histogram equalization, and looking at top x% of densities. Here, we will look at categorical mapping, which allows […]

Read More NYPD Crime #14 – Data Exploration (Part VIIII – Lat Long Visualization By Category)