******************************************* TITLE: "The Communication Network Within the Crowd" SPEAKER: Jennifer Wortman Vaughan, Microsoft Research, New York City TIME: 0905 - 0955 ABSTRACT: Since its inception, crowdsourcing has been considered a black-box approach to solicit labor from a crowd of workers. Furthermore, the “crowd" has been viewed as a group of independent workers. Recent studies based on in-person interviews have opened up the black box and shown that the crowd is not a collection of independent workers, but instead that workers communicate and collaborate with each other. In this talk, I will describe our attempt to quantify this discovery by mapping the entire communication network of workers on Amazon Mechanical Turk, a leading crowdsourcing platform. We executed a task in which over 10,000 workers from across the globe self-reported their communication links to other workers, thereby mapping the communication network among workers. Our results suggest that while a large percentage of workers indeed appear to be independent, there is a rich network topology over the rest of the population. That is, there is a substantial communication network within the crowd. We further examined how online forum usage relates to network topology, how workers communicate with each other via this network, how workers’ experience levels relate to their network positions, and how U.S. workers differ from international workers in their network characteristics. These findings have implications for requesters, workers, and platform providers. This talk is based on joint work with Ming Yin, Mary Gray, and Sid Suri. BIO: Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City, where she studies algorithmic economics, machine learning, and social computing, with a recent focus on prediction markets and crowdsourcing. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn's 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers, and a handful of best paper or best student paper awards. In her “spare” time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which has been held each year since 2006. ******************************************* TITLE: "The Minimax Rate for Adaptive Crowdsourcing" SPEAKER: Sewoong Oh, University of Illinois at Urbana-Champaign TIME: 1100 - 1145 ABSTRACT: Adaptive schemes, where tasks are assigned based on the data collected thus far, are widely used in practical crowdsourcing systems to efficiently allocate the budget. However, existing theoretical analyses of crowdsourcing systems suggest that the gain of adaptive task assignments is minimal. To bridge this gap, we propose a new model for representing practical crowdsourcing systems, which strictly generalizes the popular Dawid-Skene model, and characterize the fundamental trade-off between budget and accuracy. We introduce a novel adaptive scheme that matches this fundamental limit. We introduce new techniques to analyze the spectral analyses of non-back-tracking operators, using density evolution techniques from coding theory. BIO: Sewoong Oh is an Assistant Professor of Industrial and Enterprise Systems Engineering at UIUC. He received his PhD from the department of Electrical Engineering at Stanford University. Following his PhD, he worked as a postdoctoral researcher at Laboratory for Information and Decision Systems (LIDS) at MIT. He was co-awarded the Kenneth C. Sevcik outstanding student paper award at the Sigmetrics 2010, the best paper award at the SIGMETRICS 2015, and NSF CAREER award in 2016. ******************************************* TITLE: "Kaggle Competitions and The Future of Reproducible Machine Learning" SPEAKER: Ben Hamner, Kaggle Co-founder & CTO TIME: 1530 - 1630 ABSTRACT: At Kaggle, we’ve run hundreds of machine learning competitions and seen over 150,000 data scientists make submissions. One thing is clear: winning competitions isn’t random. We’ve learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage). In this talk, I’ll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. As collaboration has grown, we've seen reproducibility as a key pain point in machine learning. It can be incredibly tough to rerun and build on your colleague's work, public work, or even your own past work! We're expanding our focus to build a reproducible data science platform that hits directly at these pain points. It combines versioned data, versioned code, and versioned computational environments (through Docker containers) to create reproducible results. BIO: Ben Hamner is Kaggle’s co-founder and CTO. At Kaggle, he's focused on creating tools that empower data scientists to frictionlessly collaborate on machine learning and share their results. He has worked with machine learning across many domains, including natural language processing, computer vision, web classification, and neuroscience. Prior to Kaggle, Ben applied machine learning to improve brain-computer interfaces as a Whitaker Fellow at the École Polytechnique Fédérale de Lausanne in Lausanne, Switzerland. He graduated with a BSE in Biomedical Engineering, Electrical Engineering, and Math from Duke University. *******************************************