Mining massive data sets

Posted: 13 March

Offer description

We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Participants will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes. We'll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches. Many other large-scale algorithms are covered as well, as outlined in the course syllabus. Course Syllabus Week 1: MapReduce Link Analysis PageRank Week 2: Locality-Sensitive Hashing Basics Applications Distance Measures Nearest Neighbors Frequent Itemsets Week 3: Data Stream Mining Analysis of Large Graphs Week 4: Recommender Systems Dimensionality Reduction Week 5: Clustering Computational Advertising Week 6: Support-Vector Machines Decision Trees MapReduce Algorithms Week 7: More About Link Analysis Topic-specific PageRank, Link Spam. More About Locality-Sensitive Hashing Recommended Background A course in database systems is recommended, as is a basic course on algorithms and data structures. You should also understand mathematics up to multivariable calculus and linear algebra. Suggested Readings There is a free book "Mining of Massive Datasets, by Leskovec, Rajaraman, and Ullman (who by coincidence are the instructors for this course :-). You can download it at http://www.mmds.org/ Hardcopies can be purchased from Cambridge Univ. Press. Instructors Jure Leskovec, Stanford University Anand Rajaraman, Stanford University Jeff Ullman, Stanford University

Apply

Create E-mail Alert

Save