Timeline of a Sketch Spreadsheet video

This 6-minute video shows an interactive user session using the Sketch Spreadsheet running on 63 machines and browsing a dataset consisting of 1.665 billion tweets.
MinuteExplanation
0:00load data
0:12 load data clicked
0:15 metadata loaded; 1.66 billion rows; no data displayed
0:20 display timezone column
0:25 data displayed, sorted on timezone column. There are 438M rows with no timezone, 5.6M rows with timezone Abu Dhabi, etc.
0:34 displayed histogram of CreatedAt column. Some tweets have a date of 1801; start zooming in into last bar using mouse
0:58 zoomed in into January 1 and 2 2013
1:01 assigned each bucket a different color based on date
1:02 drag-and-drop yellow intersection sign from CreatedAt histogram to spreadsheet; choose "intersection".
1:11 intersection data has 1.63 billion rows
1:21 histogram AdultScore column (values between 0 and 1)
1:24 drag-and-drop colors from AdultScore histogram to CreatedAt Histogram
1:27 result is a 2D histogram of CreatedAT, where each bar is divided into colors according to AdultScore
1:41 sort descending on AdultScore (second sort column remains Timezone, not visible on screen) - lexicographic sort on 2 columns
1:43 grouped visible columns to the left
1:44 there are 1996 rows with an adult score of 1 and an empty timezone; 2 rows with an adult score of 1 and an Abu Dhabi timezone, etc.
1:55 add SpamScore column to sort order, on first position (data sorted now on 3 columns) There are 402 rows with SpamScore 0, AdultScore 1 and no TimeZone.
2:04 Draw heatmap of AdultScore vs Timezone
2:19 Heatmap drawn; color shows denisity
2:25 chosen logarithmic colors for density. Each pixel shows count of points; count vary between 2 (cyan) and 48M (orange)
2:30 zoom into lower-left corner of scatterplot (A-H time-zones, 0-0.2 AdultScore)
2:35 New heatmap drawn; density between 2 and 33M/pixel
2:58 show tweet Text column; loading takes 25 seconds - most data is in this column (some video excised; resorted text alphabetically on tweet text).
3:24 first tweets shown have funny unicode characters
3:28 atd a new computed column to spreadsheet (Map computation) Name: Length, Type: Integral, Code: row.text.Length (C# code)
4:03 New column computed and atded
4:10 histogram Length column
4:13 Length histogram displayed; Length goes up to 510 characters!
4:17 Zoom into tweets with long length; histogram of tweets with length > 500 displayed (2884 tweets)
4:19 Intersect these tweets with spreadsheet to see text of long tweets
4:22 Long tweets displayed: they all have quoted XML characters
4:27 back button pressed: displau previous set of 1.63 tweets in spreadsheet (instantaneous redisplay of cached rendering)
4:30 back button for Length spreadsheet
4:35 Zoomed into tweets with length 0-150
4:54 In CreatedAt window zoom into tweets on Jan 2 only
5:02 drag-and-drop color from CreatedAt to Length. Grey bars show tweets that have length displayed but are not on Jan 2.
5:19 zoom into Lengths 0-140, display with 35 buckets
5:41 intersect Jan 2 dataset with Length dataset to display only lengths for Jan 2; 1.186 billion tweets left
5:38 normalize histogram bars to discover correlation between Lenght and CreatedAt. No strong correlation.
5:52 drag-and-drop colors from AdultScore onto Lenght to discover AdultScore/Length correlation
5:53 Short tweets tend to have smaller adult scores
6:00 end