From Time magazine:
The work was vital for OpenAI. ChatGPT’s predecessor, GPT-3, had already shown an impressive ability to string sentences together. But it was a difficult sell, as the app was also prone to blurting out violent, sexist and racist remarks. This is because the AI had been trained on hundreds of billions of words scraped from the internet—a vast repository of human language. That huge training dataset was the reason for GPT-3’s impressive linguistic capabilities, but was also perhaps its biggest curse. Since parts of the internet are replete with toxicity and bias, there was no easy way of purging those sections of the training data. Even a team of hundreds of humans would have taken decades to trawl through the enormous dataset manually.
By “toxic and bias” what they really mean is content that “described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest.”
Perhaps $2 per hour is a lot in Kenya, but that’s no excuse for exploiting people. And given the absolute cesspool that OpenAI asked these workers to sift through, they should have included additional hazard pay.➵