OpenAI Transcribes Over A Million YouTube Hours: Navigating The Gray Area Of AI Data Use

    Date:

    Loading…

    Loading…

    OpenAI developed its Whisper audio transcription model, which was reportedly used to transcribe over a million hours of Alphabet Inc‘s GOOGLGOOG YouTube videos to train GPT-4.

    The initiative, described as a means to navigate the challenge of limited training data availability, stirred discussions around the legality and ethics of such data acquisition practices, The New York Times reported.

    See Also: Sam Altman’s $7 Trillion AI Ambition: Is OpenAI’s CEO Stretching Too Far? Expert Weighs In

    The newspaper highlighted OpenAI was aware of the legal uncertainties surrounding this method but considered it to fall within the boundaries of fair use. Greg Brockman, president of OpenAI, was notably involved in the selection process of videos for transcription.

    Responding to inquiries, an OpenAI spokesperson, Lindsay Held, communicated to The Verge that OpenAI constructs “unique” datasets for its models to enhance their “understanding of the world” while maintaining a competitive stance in global research.

    Held mentioned OpenAI’s approach to data gathering spanned various methods, including the utilization of publicly available data, partnerships for access to non-public data and exploration into generating synthetic data.

    This development came amid growing concerns within the AI industry over the availability of quality training data.

    The Wall Street Journal reported earlier a potential looming crisis where AI companies could exhaust new content sources by 2028, suggesting alternatives such as synthetic data creation or curriculum learning as possible solutions.

    The practice of using extensive internet content, including YouTube videos, without explicit permission, has led to multiple legal and ethical debates emphasizing the precarious balance AI developers must navigate between innovation and copyright compliance.

    Read Next: YouTube CEO Unsure, But Warns ‘Clear Violation’ If OpenAI Used Creators’ ‘Hard Work’ To Train Sora

    Photos: Shutterstock

    Loading…

    Loading…

    Go Source

    Chart

    SignUp For Breaking Alerts

    New Graphic

    We respect your email privacy

    Share post:

    Popular

    More like this
    Related