Skip to content
Statistically Insignificant

Statistically Insignificant

Trying, Failing, and Sometimes Succeeding at Data Science

  • About
  • Contact
  • LinkedIn
  • Github
  • Soundcloud

Hello!

My name is Chi Wang and I’m an aspiring data nerd from Alberta, Canada! I’m a consultant at Deloitte Canada by day, and a complete mess by night usually experimenting with music, playing basketball, or tinkering with machine learning. This blog aims to sharpen my data science skills by trying, failing, and sometimes succeeding!

Projects

  • 1. All-NBA Predict (32)
  • 2. Music Genre Clustering (7)
  • 3. Edmonton Property Assessment (5)
  • 4. Chi / Larissa Face Detection (18)
  • 5. NYPD Crime (20)

Search

Tag: parquet

NYPD Crime #13 – Data Exploration (Part VIII – Datashader Deep Dive)

Review Last post, we looked at Loading a parquet Installing datashader Using datashader We ran into enough issues to make a post out of those items. Parquet On the parquet side, I became familiar with the parquet file format and the partitioned nature of data storage within distributed computing platforms (in this case, I used […]

Read More NYPD Crime #13 – Data Exploration (Part VIII – Datashader Deep Dive)

NYPD Crime #12 – Data Exploration (Part VII – Lat Long Visualization)

Review In the last 2 posts, we reviewed (largely using Spark and Spark SQL (very handy)) all of the interesting fields. All of them except latitude and longitude. I ended the last post puzzled about how to actually plot this many points (5 million points!). Spark didn’t have anything to do this, so I had […]

Read More NYPD Crime #12 – Data Exploration (Part VII – Lat Long Visualization)

NYPD Crime #10 – Data Exploration (Part V – Date-Time Exploration)

Intro At this point, we’ve gotten a feel for the data, but we really still don’t know what’s in it. After 9 posts, we’re finally ready to actually explore haha… Oh, the miraculous world of data science… Similar to the All-NBA prediction project I was doing, I really have no objective here. I came in […]

Read More NYPD Crime #10 – Data Exploration (Part V – Date-Time Exploration)

NYPD Crime #9 – Data Exploration (Part III – Feature Building)

Let’s get right into it. If you’ve just landed on this post, please read the past few “Data Exploration” posts on this project to understand the context of what I’m trying to do here in this post. Note that this post is almost identical to the last with the exception of replacing all 24:00:00 time […]

Read More NYPD Crime #9 – Data Exploration (Part III – Feature Building)
Blog at WordPress.com.
Cancel

 
Loading Comments...
Comment
    ×