This tool was trained with machine learning using billions of lines of code from public repositories and can transform natural language into code snippets across dozens of programming languages, but there’s a problem with them doing this
Clipping authors out
While Copilot can speed up the process of writing code and ease the development of software, the fact that it uses public open-source code has caused experts to worry about the fact that it violates licensing rules around attributions and limitations.
Open-source licences, such as GPL, Apache and MIT licences, require that a person posts attribution of the author’s name and defining particular copyrights.
Joseph Saveri, the law firm representing Butterick in the litigation.
- GitHub’s terms of service and privacy policies,
- DMCA 1202, which forbids the removal of copyright-management information,
- the California Consumer Privacy Act,
- and other laws giving rise to the related legal claims.
The complaint was submitted to the U.S. District Court of the Northern District of California, demanding the approval of statutory damages to the tune of $9,000,000,000.
“Each time Copilot provides an unlawful Output it violates Section 1202 three times (distributing the Licensed Materials without: (1) attribution, (2) copyright notice, and (3) License Terms). So, if each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000.”reads the complaint.
Butterick also commented on another topic in a blog post earlier in October, where he discussed the damage that Copilot could bring to the open-source community
He argued that the incentive for open-source contributions and collaboration is essentially removed by offering people code snippets and never telling them about the creator of the code and how to attribute them for it.
“Microsoft is creating a new walled garden that will inhibit programmers from discovering traditional open-source communities. Over time, this process will starve these communities. User attention and engagement will be shifted […] away from the open-source projects themselves—away from their source repos, their issue trackers, their mailing lists, their discussion boards.”writes Butterick
He fears that given enough time, Copilot will cause open-source communities to decline, and in turn, the quality of the code in the training of the training data for the AI will be diminished