Mihai Budiu's Web Page

Mihai Budiu Mihai Budiu
c/o Microsoft Research SVC
Mailing address: 1065 La Avenida
Mountain View, CA 94043

Phone: (650) 693 1440
Fax: (425) 936 7329
Cell: (412) 216 8430
my e-mail
Actual office address: 1288 Pear Avenue, building 6, room 2076.

I am a researcher at Microsoft Research in Silicon Valley.

I am working on various aspects of cloud computing infrastructure and applications:

  • Dryad [video] is the runtime engine for Microsoft Bing data analytics
  • DryadLINQ is a compiler which allows you to program large clusters using .Net.
  • Performance analysis, monitoring, debugging and visualization of large clusters with Artemis
  • Large-scale applications built on DryadLINQ: machine learning, data mining, image and video processing, combinatorial optimization, distributed data structures, etc. A particularly exciting application is training the real-time body tracking system for Xbox's Kinect.

I have also worked on program security and hardware critical path analysis.

I have a Ph.D. from Carnegie Mellon (2004). There I worked in compilers, computer architecture, reconfigurable hardware (PipeRench: virtualized reconfigurable hardware, licensed by two start-ups), and high-level synthesis (CASH: A Compiler for Application-Specific Hardware). My old CMU web page.

Here are some interviews I took: Chuck Thacker, Leslie Lamport, and Kurt Akeley.

Recent publications

These are my publications since I have joined Microsoft; a complete list is in my my CV. Here's my dblp record. The copyrights of these publications are owned by various organizations, the papers are reproduced here for your convenience. My coauthors.
ESOP 13
bibtex
The Compiler Forest
Mihai Budiu, Joel Galenson, and Gordon Plotkin
European Symposium on Programming (ESOP), Rome, Italy, March 16-24, 2013, 20 pages
TOCS 12
bibtex
Fay: Extensible Distributed Tracing from Kernels to Clusters
Úlfar Erlingsson, Marcus Peinado, Simon Peter, Mihai Budiu, and Gloria Mainar-Ruiz
Transactions on Computer Systems (TOCS), Vol. 30, no 4, November, 2012
An expanded version of the SOSP 2011 paper
HPDC 12
slides
bibtex
Putting A ``Big-Data'' Platform to Good Use: Training Kinect
Mihai Budiu
June 20, 2012
Keynote to the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC)
BIGLEARN 11
bibtex
Parallelizing the Training of the Kinect Body Parts Labeling Algorithm
Mihai Budiu, Jamie Shotton, Derek G. Murray, and Mark Finocchio
Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011
SOSP 11
bibtex
Fay: Extensible Distributed Tracing from Kernels to Clusters
Úlfar Erlingsson, Marcus Peinado, Simon Peter, and Mihai Budiu
ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 23-26, 2011
MAPREDUCE 11
slides
bibtex
Parallelizing large-scale data processing applications with data skew: a case study in product-offer matching
Ekaterina Gonina, Anitha Kannan, John Shafer, and Mihai Budiu
International Workshop on MapReduce and its Applications (MAPREDUCE), San Jose, CA, June 8, 2011
HIPS 11
slides
bibtex
Monitoring and Debugging DryadLINQ Applications with Daphne
Vilas Jagannath, Zuoning Yin, and Mihai Budiu
International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), Anchorage, AK, May 20, 2011
CHAPTER 11
bibtex
Chapter in, Scaling Up Machine Learning
Frank McSherry, Yuan Yu, Mihai Budiu, Michael Isard, and Dennis Fetterly
Large-Scale Machine Learning using DryadLINQ, Cambridge University Press, 2011
IPDPS 11
slides
bibtex
DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines
Mihai Budiu, Daniel Delling, and Renato Werneck
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, May 16-20, 2011
TR 10
bibtex
User interfaces for exploring multi-dimensional data sets
Mihai Budiu
Microsoft Research Technical Report, MSR-TR-2010-67, June, 2010
TISSEC 09
bibtex
Control-Flow Integrity principles, implementations and applications
Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti
ACM Transactions on Information and System Security (TISSEC), Vol. 13, no 1, 2009, pages 1-40
LADIS 09
slides
bibtex
Towards Automatic Policy Refinement in Repair Services for Large Distributed Systems
Moises Goldszmidt, Mihai Budiu, Yue zhang, and Michael Pechuk
Large Scale Distributed Systems and Middleware (LADIS), Big Sky Resort, Big Sky, Montana, October 10-11, 2009, 5 pages
Also published in ACM SIGOPS Operating Systems Review vol 44 no 2, 2010, pp 47-51.
SOCC 09
slides
bibtex
Tuning SoCs using the Dynamic Critical Path
Hari Kannan, Mihai Budiu, John D. Davis, and Girish Venkataramani
IEEE International SOC Conference, Belfast, Northern Ireland, September 9-11, 2009
Also as Microsoft Research Technical Report MSR-TR-2009-44
HOTCLOUD 09
slides
bibtex
DryadInc: Reusing work in large-scale computations
Lucian Popa, Mihai Budiu, Yuan Yu, and Michael Isard
Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 15, 2009
WASL 08
slides
bibtex
Hunting for problems with Artemis
Gabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises Goldszmidt
USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008
OSDI 08
bibtex
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008, 14 pages
TR 08
bibtex
Some sample programs written in DryadLINQ
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Frank McSherry, and Kannan Achan
Microsoft Research Technical Report, MSR-TR-2008-74, May, 2008, 37 pages
DAC 07
slides
bibtex
Critical Path: A Tool for System-Level Timing Analysis
Girish Venkataramani, Tiberiu Chelcea, Mihai Budiu, and Seth C. Goldstein
Design Automation Conference (DAC), San Diego, CA, June 4-8, 2007
An expanded version is in CMU-CS-06-144
EUROSYS 07
bibtex
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007, pages 59-72
Also as technical report MSR-TR-2006-140
OSDI 06
bibtex
XFI: Software Guards for System Address Spaces
Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, and George C. Necula
Symposium on Operating System Design and Implementation (OSDI), Seattle, WA, November 6-8, 2006, pages 75-88
ASID 06
slides
bibtex
Architectural Support for Software-Based Protection
Mihai Budiu, Úlfar Erlingsson, and Martín Abadi
Workshop on Architectural and System Support for Improving Software Dependability (ASID), San Jose, CA, October 21, 2006, pages 42-51
Also as technical report MSR-TR-2006-115
ASPLOS 06
bibtex
Tartan: Evaluating Spatial Computation For Whole Program Execution
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Mihai Budiu, and Seth C. Goldstein
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), San Jose, CA, October 21-25, 2006, pages 163-174
ICFEM 05
bibtex
A Theory of Secure Control-Flow
Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti
International Conference on Formal Engineering Methods (ICFEM), Manchester, UK, November 1-4, 2005, pages 111-124
CCS 05
bibtex
Control-Flow Integrity
Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti
ACM Conference on Computer and Communication Security (CCS), Alexandria, VA, November 7-11, 2005, pages 340-353

Some recent talks I gave:

Data-Intensive Cluster Computing
San Jose State University Course on Cloud Computing, San Jose, CA, April 14, 2014
Systems for Data-Intensive Cluster Computing
ALMADA Summer School, Moscow, Russia, July 30-August 7, 2013
Interactive Parallel Data Visualization and Exploration
Microsoft Research, Silicon Valley, Mountain View, CA, March 23, 2013
Data-Intensive Cluster Computing
San Jose State University Course 297C1 on Cloud Computing, Santa Clara, CA, April 18, 2013
Data-Intensive Cluster Computing
San Jose State University Course 297C1 on Cloud Computing, Santa Clara, CA, November 8, 2012
The Compiler Forest
Workshop on Exploiting Concurrency Efficiently and Correctly, Berkeley, CA, July 7, 2012
Playing with Big Data
HPDC 2012 Keynote Presentation, Delft, Netherlands, June 20, 2012
Training Kinect
UCSD Center for Networked Systems 2012 Research Review, San Diego, CA, Feb 8, 2012
Programming Clusters with DryadLINQ
Carnegie Mellon Silicon Valley, Mountain View, CA, Dec 5, 2011
Crunching Big Data
Microsoft Research 20th Anniversary, Mountain View, CA, Sept 27, 2011
Data-Intensive Cluster Computing
San Jose State University, Course CMPE 281: Special Topics in Computer Engineering --- Cloud Technologies, Santa Clara, CA, September 21, 2011
Monitoring and Debugging Clusters with Daphne
International Workshop on High-level Parallel Programming Models and Supportive Environments (HIPS), Anchorage, AK, May 20, 2011
DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, May 19, 2011
Programming Clusters with DryadLINQ
ACCU monthly meeting, Mountain View, CA, April 13, 2011
The Kinect Body Tracking Pipeline
NASA Ames Reseach Center, Mountain View, CA, February 15, 2011
Cluster Computing with DryadLINQ
San Jose State University Course 297C1 on Cloud Computing, Santa Clara, CA, September 13, 2010
Large-Scale Machine Learning using DryadLINQ
Workshop on Ambient Intelligence: From Sensor Networks to Smart Environments and Social Media, Stanford University, June 1, 2010
Large-Scale Machine Learning using DryadLINQ
SIAM Workshop on High Performance Analytics, Columbus, OH, May 1, 2010
Cluster Computing with DryadLINQ
Cloudera, Inc., Burlingame, CA, February 12, 2010
Cluster Computing with DryadLINQ
Guest lecture for the graduate course Cloud Computing: Infrastructure, Services, and Applications, UC Berkeley, Berkeley, CA, March 4, 2009
Hunting for Bugs with Artemis
USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008
Cluster Computing for the masses with DryadLINQ
Intel Research/UC Berkeley System Seminar Series, Intel Research, Berkeley, CA, October 9, 2008
DryadLINQ - a language for data-parallel computation on computer clusters
Cloud Computing-the New Face of Computing-Promises and Challenges, Stanford, CA, July 19, 2008
Cluster Computing with DryadLINQ
Palo Alto Research Center CSL Colloquium, Palo Alto, CA, May 8, 2008
Cluster Computing with Dryad
Microsoft Live Labs, Bellevue, WA, March 4, 2008
A Toolkit for Machine-Learning in DryadLINQ
Microsoft Research SVC, Mountain View, CA, January 30, 2008
Critical Path: A Tool for System-Level Timing Analysis
Microsoft Research SVC, Mountain View, CA, May 23, 2007
Architectural Support for Software-Based Protection
ASID 2006, San Jose, CA, October 21, 2006
On the Critical Path of Computations
Tensilica, Inc., Santa Clara, CA, August 3, 2006
Spatial Computation
Google, Mountain View, CA, May 10, 2005
On the Critical Path of Computations
Microsoft Research Silicon Valley, Mountain View, CA, March 30, 2005
Dataflow: A Complement to Superscalar
ISPASS 2005, Austin, TX, March 22, 2005
Inter-Iteration Scalar Replacement in the Presence of Conditional Control-Flow
ODES 2005, San Jose, CA, March 20, 2005
Spatial Computation
ASPLOS 2004, Boston, MA, October 11, 2004
Translating ANSI C Into Asynchronous Circuits
Tutorial at ASYNC 2004, Hersonissos, Crete, Greece, April 19, 2004
On the Energy Efficiency of Computation
CALCM Seminar, Carnegie Mellon University, February 17, 2004