The paper, entitled The Effects of Automatic Speech Recognition Quality on Human Transcription Latency, explores how automated speech recognition and crowd-sourced human correction and generation of transcripts can be traded off to improve accuracy and latency.
Transcription makes speech accessible to deaf and hard of hearing people. Despite high cost, converting speech to text is still done manually by human experts in most real-world settings because the quality of automated speech recognition (ASR) is still too low. Manual conversion can require more than 5 times the original audio time, which also introduces significant latency. Giving transcriptionists ASR output as a starting point seems like a reasonable approach to making humans more efficient and thereby reducing this cost, but the effectiveness of this approach is related to the quality of the speech recognition output. At high error rates, fixing inaccurate speech recognition output may take longer than producing the transcription from scratch, and transcriptionists may not always realize when transcription output is too inaccurate to be useful. By better understanding this problem, future systems can more efficiently combine both sources of transcription.
Prof. Walter Lasecki received his PhD in Computer Science from the University of Rochester in 2015 and joined the faculty at Michigan the same year. His research combines human and machine computation to create intelligent systems that can solve problems ranging from accessibility needs for users with disabilities to rapid analysis of large data sets. He has helped to introduce continuous real-time crowdsourcing, as well as the crowd agent model, which uses computer-mediated groups of people submitting input simultaneously to create a collective intelligence capable of completing tasks better than any constituent member.
Posted: April 20, 2016