CrowdML - NIPS ’16 Workshop on Crowdsourcing and Machine Learning

Information

Location: NIPS 2016 Workshop, Barcelona, Spain
Room: Room 120 + 121
Date: 9 Dec, 2016
Resources: See more information on CrowdML site
Contact: For any questions, please email nips16crowd AT gmail
Updates:
- 2016-11-25: NIPS workshop registration closes on 27th Nov.
- 2016-11-25: Abstracts of invited talks are available here!
- 2016-11-25: Final schedule is now available here!
- 2016-11-11: Poster format is max-width=36in (91cm), max-height=48in (122cm).
- 2016-10-15: List of accepted papers is now available here!

Overview

Building systems that seamlessly integrate machine learning (ML) and human intelligence can greatly push the frontier of our ability to solve challenging real-world problems. While ML research usually focuses on developing more efficient learning algorithms, it is often the quality and amount of training data that predominantly govern the performance of real-world systems. This is only amplified by the recent popularity of large scale and complex learning methodologies such as Deep Learning, which can require millions to billions of training instances to perform well. The recent rise of human computation and crowdsourcing approaches, made popular by platforms like Amazon Mechanical Turk and CrowdFlower, enable us to systematically collect and organize human intelligence. Crowdsourcing research itself is interdisciplinary, combining economics, game theory, cognitive science, and human-computer interaction.

The goal of this workshop is to bring crowdsourcing and ML experts together to explore how crowdsourcing can contribute to ML and vice versa. Specifically, we will focus on the design of mechanisms for data collection and ML competitions, and conversely, applications of ML to complex crowdsourcing platforms.

Organizers

Adish Singla. ETH Zurich.
Matteo Venanzi. Microsoft.
Rafael M. Frongillo. University of Colorado Boulder.

Talks

Ben Hamner, Kaggle Co-founder & CTO.
Sewoong Oh, University of Illinois at Urbana-Champaign.
Jennifer Wortman Vaughan, Microsoft Research, New York City.

Schedule

The final schedule is now available below (please refresh the page to see the latest version!). Abstracts of invited talks are available here.

Session 1 (0830 - 1030)

0830 - 0900: Poster setup (setting up 20 posters will take time, come early to grab the best spot!).
0900 - 0905: Opening Remarks.
0905 - 0955: Jennifer Wortman Vaughan, "The Communication Network Within the Crowd".
0955 - 1005: Edoardo Manino, "Efficiency of Active Learning for the Allocation of Workers on Crowdsourced Classification Tasks".
1005 - 1015: Yao-Xiang Ding, "Crowdsourcing with Unsure Option".
1015 - 1025: Yang Liu, "Doubly Active Learning: When Active Learning meets Active Crowdsourcing".

Coffee + Posters (1030 - 1100)

Session 2 (1100 - 1230)

1100 - 1145: Sewoong Oh, "The Minimax Rate for Adaptive Crowdsourcing".
1145 - 1155: Matteo Venanzi,"Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems".
1155 - 1205: Miles E. Lopes, "A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles".
1210 - 1230: Ashish Kapoor, "Identifying and Accounting for Task-Dependent Bias in Crowdsourcing".

Lunch (1230 - 1400)

Session 3 (1400 - 1500)

1400 - 1415: Boi Faltings, "Incentives for Effort in Crowdsourcing Using the Peer Truth Serum".
1415 - 1430: David Parkes, "Peer Prediction with Heterogeneous Tasks".
1430 - 1445: Jens Witkowski, "Proper Proxy Scoring Rules".
1445 - 1500: Jordan Suchow, "Rethinking Experiment Design as Algorithm Design".

Coffee + Posters (1500 - 1530)

Session 4 (1530 - 1800)

1530 - 1630: Ben Hamner (Kaggle), "Kaggle Competitions and The Future of Reproducible Machine Learning".
1630 - 1800: Poster Session.

Accepted Papers

Research Track

"Incentives for Effort in Crowdsourcing Using the Peer Truth Serum"; Boi Faltings, Goran Radanovic, Radu Jurca.
"Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems"; Matteo Venanzi, John Guiver, Pushmeet Kohli, Nicholas R. Jennings.
"Doubly Active Learning: When Active Learning meets Active Crowdsourcing"; Yang Liu, Yining Wang.
"A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles"; Miles E. Lopes.
"Crowdsourcing with Unsure Option"; Yao-Xiang Ding, Zhi-Hua Zhou.
"Proper Proxy Scoring Rules"; Jens Witkowski, Pavel Atanasov, Lyle Ungar, Andreas Krause.
"Rethinking Experiment Design as Algorithm Design"; Jordan Suchow, Thomas Griffiths.
"Efficiency of Active Learning for the Allocation of Workers on Crowdsourced Classification Tasks"; Edoardo Manino, Long Tran-Thanh, Nicholas Jennings.
"Peer Prediction with Heterogeneous Tasks"; Debmalya Mandal, Matthew Leifer, David Parkes, Galen Pickard, Victor Shnayder.
"Identifying and Accounting for Task-Dependent Bias in Crowdsourcing"; Ece Kamar, Ashish Kapoor, Eric Horvitz.

Encore Track

"Reliable Crowdsourcing under the Generalized Dawid-Skene Model"; NIPS 2016.
"Analyzing Large-Scale Public Campaigns on Twitter"; SocInfo 2016.
"Low-Cost Learning via Active Data Procurement"; EC 2015.
"Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction"; NIPS 2016.
"Opportunities or Risks to Reduce Labor in Crowdsourcing Translation?"; IJCAI 2015.
"Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds"; AAAI 2016.
"Optimality of Belief Propagation for Crowdsourced Classification"; ICML 2016.
"Finding One's Best Crowd: Online Learning By Exploiting Source Similarity"; AAAI 2016.
"False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking"; ICML 2016.
"Sensor Selection for Crowdsensing Dynamical Systems"; AISTATS 2015.
"Crowdsourced Semantic Matching of Multi-Label Annotations"; IJCAI 2015.
"An Optimal Algorithm for the Thresholding Bandit Problem"; ICML 2016.
"Crowdsourced Clustering: Querying Edges vs Triangles"; NIPS 2016.
"Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing"; ICML 2015.
"Active Learning from Imperfect Labelers"; NIPS 2016.
"Ability Grouping of Crowd Workers via Reward Discrimination"; HCOMP 2013.
"Learning and Feature Selection under Budget Constraints in Crowdsourcing"; HCOMP 2016.
"Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing"; NIPS 2016.

Topics of Interest

Topics of interests in the workshop include (but are not limited to):

Crowdsourcing for Data Collection
Crowdsourcing is one of the most popular approaches to data collection for ML, and therefore one of the biggest avenues through which crowdsourcing can advance the state of the art in ML. We seek cost-efficient and fast data collection methods based on crowdsourcing, and ask how design decisions in these methods could impact subsequent stages of ML systems. Topics of interest include:

Basic annotation: What is the best way to collect and aggregate labels for unlabeled data from the crowd? How can we increase fidelity by flagging labels as uncertain given the crowd feedback? How can we do the above in the most cost-efficient manner?
Beyond simple annotation tasks: What is the most effective way to collect probabilistic data from the crowd? How can we collect data requiring global knowledge of the domain such as building Bayes net structure via crowdsourcing?
Time-sensitive and complex tasks: How can we design crowdsourcing systems to handle real-time or time-sensitive tasks, or those requiring more complicated work dependencies? Can we encourage collaboration on complex tasks?
Data collection for specific domains: How can ML researchers apply the crowdsourcing principles to specific domains (e.g., healthcare) where privacy and other concerns are at play?

Crowdsourcing the ML Research via Competitions
Through the Netflix challenge and now platforms like Kaggle, we are seeing the crowdsourcing of ML research itself. Yet the mechanisms underlying these competitions are extremely simple. Here our focus is on the design of such competitions; topics of interest include:

What is the most effective way to incentivize the crowd to participate in the ML competitions? What is the most efficient method; rather than the typically winner-takes-all, can we design a mechanism which makes better use of the net research-hours devoted to the competition?
Competitions as recruiting: how would we design a competition differently if (as is often the case) the result is not a winning algorithm but instead a job offer?
Privacy issues with data sharing are one of the key barriers to holding such competitions. How can we design privacy-aware mechanisms which allow enough access to enable a meaningful competition?
Challenges arising from the sequential and interactive nature of competitions, e.g., how can we maintain unbiased leaderboards without allowing for overfitting?

ML for Crowdsourcing systems
General crowdsourcing systems such as Duolingo, FoldIt, and Galaxy Zoo confront challenges of reliability, efficiency, and scalability, for which ML can provide powerful solutions. Many ML approaches have already been applied to output aggregation, quality control, work flow management and incentive design, but there is much more that could be done, either through novel ML methods, major redesigns of workflow or mechanisms, or on new crowdsourcing problems. Topics here include:

Dealing with sparse, noisy and large number of label classes, for example, in tagging image collection for Deep Learning based computer vision algorithms.
Optimal budget allocation and active learning in crowdsourcing.
Open theoretical questions in crowdsourcing that can be addressed by statistics and learning theory, for instance, analyzing label aggregation algorithms such as EM, or budget allocation strategies.