| Minute | Explanation |
| 0:00 | load data |
| 0:12 | load data clicked |
| 0:15 | metadata loaded; 1.66 billion rows; no data displayed |
| 0:20 | display timezone column |
| 0:25 | data displayed, sorted on timezone column. There are 438M rows
with no timezone, 5.6M rows with timezone Abu Dhabi, etc. |
| 0:34 | displayed histogram of CreatedAt column. Some tweets have a date
of 1801; start zooming in into last bar using mouse |
| 0:58 | zoomed in into January 1 and 2 2013 |
| 1:01 | assigned each bucket a different color based on date |
| 1:02 | drag-and-drop yellow intersection sign from CreatedAt histogram to
spreadsheet; choose "intersection". |
| 1:11 | intersection data has 1.63 billion rows |
| 1:21 | histogram AdultScore column (values between 0 and 1) |
| 1:24 | drag-and-drop colors from AdultScore histogram to CreatedAt Histogram |
| 1:27 | result is a 2D histogram of CreatedAT, where each bar is
divided into colors according to AdultScore |
| 1:41 | sort descending on AdultScore (second sort column remains
Timezone, not visible on screen) - lexicographic sort on 2 columns |
| 1:43 | grouped visible columns to the left |
| 1:44 | there are 1996 rows with an adult score of 1 and an empty
timezone; 2 rows with an adult score of 1 and an Abu Dhabi
timezone, etc. |
| 1:55 | add SpamScore column to sort order, on first position (data
sorted now on 3 columns)
There are 402 rows with SpamScore 0, AdultScore 1 and no
TimeZone. |
| 2:04 | Draw heatmap of AdultScore vs Timezone |
| 2:19 | Heatmap drawn; color shows denisity |
| 2:25 | chosen logarithmic colors for density. Each pixel shows count
of points; count vary between 2 (cyan) and 48M (orange) |
| 2:30 | zoom into lower-left corner of scatterplot (A-H time-zones,
0-0.2 AdultScore) |
| 2:35 | New heatmap drawn; density between 2 and 33M/pixel |
| 2:58 | show tweet Text column; loading takes 25 seconds - most data
is in this column
(some video excised; resorted text alphabetically on tweet text). |
| 3:24 | first tweets shown have funny unicode characters |
| 3:28 | atd a new computed column to spreadsheet (Map computation)
Name: Length, Type: Integral, Code: row.text.Length (C# code) |
| 4:03 | New column computed and atded |
| 4:10 | histogram Length column |
| 4:13 | Length histogram displayed; Length goes up to 510 characters! |
| 4:17 | Zoom into tweets with long length; histogram of tweets with
length > 500 displayed (2884 tweets) |
| 4:19 | Intersect these tweets with spreadsheet to see text of long
tweets |
| 4:22 | Long tweets displayed: they all have quoted XML characters |
| 4:27 | back button pressed: displau previous set of 1.63 tweets in
spreadsheet (instantaneous redisplay of cached rendering) |
| 4:30 | back button for Length spreadsheet |
| 4:35 | Zoomed into tweets with length 0-150 |
| 4:54 | In CreatedAt window zoom into tweets on Jan 2 only |
| 5:02 | drag-and-drop color from CreatedAt to Length. Grey bars show
tweets that have length displayed but are not on Jan 2. |
| 5:19 | zoom into Lengths 0-140, display with 35 buckets |
| 5:41 | intersect Jan 2 dataset with Length dataset to display only
lengths for Jan 2; 1.186 billion tweets left |
| 5:38 | normalize histogram bars to discover correlation between
Lenght and CreatedAt. No strong correlation. |
| 5:52 | drag-and-drop colors from AdultScore onto Lenght to discover
AdultScore/Length correlation |
| 5:53 | Short tweets tend to have smaller adult scores |
| 6:00 | end |