Information

  • Location: NIPS 2016 Workshop, Barcelona, Spain
  • Room: Room 120 + 121
  • Date: 9 Dec, 2016
  • Resources: See more information on CrowdML site
  • Contact: For any questions, please email nips16crowd AT gmail
  • Updates:
    • 2016-11-25: NIPS workshop registration closes on 27th Nov.
    • 2016-11-25: Abstracts of invited talks are available here!
    • 2016-11-25: Final schedule is now available here!
    • 2016-11-11: Poster format is max-width=36in (91cm), max-height=48in (122cm).
    • 2016-10-15: List of accepted papers is now available here!

Overview

Building systems that seamlessly integrate machine learning (ML) and human intelligence can greatly push the frontier of our ability to solve challenging real-world problems. While ML research usually focuses on developing more efficient learning algorithms, it is often the quality and amount of training data that predominantly govern the performance of real-world systems. This is only amplified by the recent popularity of large scale and complex learning methodologies such as Deep Learning, which can require millions to billions of training instances to perform well. The recent rise of human computation and crowdsourcing approaches, made popular by platforms like Amazon Mechanical Turk and CrowdFlower, enable us to systematically collect and organize human intelligence. Crowdsourcing research itself is interdisciplinary, combining economics, game theory, cognitive science, and human-computer interaction.

The goal of this workshop is to bring crowdsourcing and ML experts together to explore how crowdsourcing can contribute to ML and vice versa. Specifically, we will focus on the design of mechanisms for data collection and ML competitions, and conversely, applications of ML to complex crowdsourcing platforms.

Organizers

Talks

Schedule

The final schedule is now available below (please refresh the page to see the latest version!). Abstracts of invited talks are available here.

Session 1 (0830 - 1030)

  • 0830 - 0900: Poster setup (setting up 20 posters will take time, come early to grab the best spot!).
  • 0900 - 0905: Opening Remarks.
  • 0905 - 0955: Jennifer Wortman Vaughan, "The Communication Network Within the Crowd".
  • 0955 - 1005: Edoardo Manino, "Efficiency of Active Learning for the Allocation of Workers on Crowdsourced Classification Tasks".
  • 1005 - 1015: Yao-Xiang Ding, "Crowdsourcing with Unsure Option".
  • 1015 - 1025: Yang Liu, "Doubly Active Learning: When Active Learning meets Active Crowdsourcing".
Coffee + Posters (1030 - 1100)

Session 2 (1100 - 1230)

  • 1100 - 1145: Sewoong Oh, "The Minimax Rate for Adaptive Crowdsourcing".
  • 1145 - 1155: Matteo Venanzi,"Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems".
  • 1155 - 1205: Miles E. Lopes, "A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles".
  • 1210 - 1230: Ashish Kapoor, "Identifying and Accounting for Task-Dependent Bias in Crowdsourcing".
Lunch (1230 - 1400)

Session 3 (1400 - 1500)

  • 1400 - 1415: Boi Faltings, "Incentives for Effort in Crowdsourcing Using the Peer Truth Serum".
  • 1415 - 1430: David Parkes, "Peer Prediction with Heterogeneous Tasks".
  • 1430 - 1445: Jens Witkowski, "Proper Proxy Scoring Rules".
  • 1445 - 1500: Jordan Suchow, "Rethinking Experiment Design as Algorithm Design".
Coffee + Posters (1500 - 1530)

Session 4 (1530 - 1800)

  • 1530 - 1630: Ben Hamner (Kaggle), "Kaggle Competitions and The Future of Reproducible Machine Learning".
  • 1630 - 1800: Poster Session.

Accepted Papers

Research Track

Encore Track

Topics of Interest

Topics of interests in the workshop include (but are not limited to):

Crowdsourcing for Data Collection
Crowdsourcing is one of the most popular approaches to data collection for ML, and therefore one of the biggest avenues through which crowdsourcing can advance the state of the art in ML. We seek cost-efficient and fast data collection methods based on crowdsourcing, and ask how design decisions in these methods could impact subsequent stages of ML systems. Topics of interest include:

  • Basic annotation: What is the best way to collect and aggregate labels for unlabeled data from the crowd? How can we increase fidelity by flagging labels as uncertain given the crowd feedback? How can we do the above in the most cost-efficient manner?
  • Beyond simple annotation tasks: What is the most effective way to collect probabilistic data from the crowd? How can we collect data requiring global knowledge of the domain such as building Bayes net structure via crowdsourcing?
  • Time-sensitive and complex tasks: How can we design crowdsourcing systems to handle real-time or time-sensitive tasks, or those requiring more complicated work dependencies? Can we encourage collaboration on complex tasks?
  • Data collection for specific domains: How can ML researchers apply the crowdsourcing principles to specific domains (e.g., healthcare) where privacy and other concerns are at play?

Crowdsourcing the ML Research via Competitions
Through the Netflix challenge and now platforms like Kaggle, we are seeing the crowdsourcing of ML research itself. Yet the mechanisms underlying these competitions are extremely simple. Here our focus is on the design of such competitions; topics of interest include:

  • What is the most effective way to incentivize the crowd to participate in the ML competitions? What is the most efficient method; rather than the typically winner-takes-all, can we design a mechanism which makes better use of the net research-hours devoted to the competition?
  • Competitions as recruiting: how would we design a competition differently if (as is often the case) the result is not a winning algorithm but instead a job offer?
  • Privacy issues with data sharing are one of the key barriers to holding such competitions. How can we design privacy-aware mechanisms which allow enough access to enable a meaningful competition?
  • Challenges arising from the sequential and interactive nature of competitions, e.g., how can we maintain unbiased leaderboards without allowing for overfitting?

ML for Crowdsourcing systems
General crowdsourcing systems such as Duolingo, FoldIt, and Galaxy Zoo confront challenges of reliability, efficiency, and scalability, for which ML can provide powerful solutions. Many ML approaches have already been applied to output aggregation, quality control, work flow management and incentive design, but there is much more that could be done, either through novel ML methods, major redesigns of workflow or mechanisms, or on new crowdsourcing problems. Topics here include:

  • Dealing with sparse, noisy and large number of label classes, for example, in tagging image collection for Deep Learning based computer vision algorithms.
  • Optimal budget allocation and active learning in crowdsourcing.
  • Open theoretical questions in crowdsourcing that can be addressed by statistics and learning theory, for instance, analyzing label aggregation algorithms such as EM, or budget allocation strategies.