My overall conclusion in this complex topic is that GitHub violated copyright, my basis for my opinion comes pretty much from this paper.
People argue that GitHub did not infringe any copyright because they used an A.I…, however the fact that they used a tool to create a derivative work based on intellectual property that you don’t have the rights to use doesn’t negate the fact that you’re still violating copyright,
It also doesn’t matter that the GPL doesn’t explicitly prohibit the use of AI/ML tools for doing so (just like it doesn’t matter that the GPL doesn’t explicitly say you can’t use Copy & Paste as your tool of theft).
The excuse GitHub pulls is that its based on untested round, basically implying that it is okay to scrape code using your code without any permission because it is collected trough an A.I. which is - as said - not mentioned in the license. Github also changed later their GitHub Terms of Service to allow themselves to use your code. In other words you give GitHub the right to host your code and to use your code to improve their products and features.
Regardless what they own platform writes, GitHub is thankfully not the law and I disagree that A.I. scraping should be accepted in the open-source community.
My burden of proof for my final statement relies on Copy Right Law of the US (Title 17), Chapter 1, Section 117.