Home > A Crowdsourced Corpus of AAC-like Communications

We used Amazon Mechanical Turk to create a large set of fictional AAC-like communications. Workers were asked to invent communications as if they were using a scanning-style AAC interface for communication. Our AAC corpus contains approximately six thousand communications. We found our crowdsourced collection modeled conversational AAC better than datasets based on telephone conversations or newswire text. We leveraged our crowdsourced messages to intelligently select sentences from much larger sets of Twitter, blog and Usenet data. For details, see our paper.

Below you can download our corpus of communications, some of the test sets we used, and some of our trained language models. Language models are in ARPA text format. If you use this resource in your research, please reference:

We thank Keith Trnka for allowing us to provide the Switchboard test set. We thank Horabail Venkatagiri for allowing us to provide the communication test set. Our specialists test set was created from the phrases suggested by AAC professionals on these pages at the University of Nebraska-Lincoln. The original pages are no longer available.

With the exception of lm_test_switch.txt and lm_test_comm.txt, the resources listed below are licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Corpus:

Language models: