Damn, looks like Goerge Webb gonna be right again about Open AI and Suchir Balaji. Now this is interesting.
I saw that. It wouldn't surprise me a bit.
But again. From my point of view, which might be narrow, how and where they acquired the datasets is not that interesting. It might explain how they did it so cheap, but there is still the possibility that they have superior model search and reasoning chain algorithms.
Datasets are going to be a commodity. Companies will be selling\licensing just the datasets as products that have been web-scraped and trained. Both general dataset and specialized domain datasets like those that might be trained for financial or engineering knowledge. But you still need searching and reasoning code. To me THAT is where the secret sauce lives. I think web-scraping and dataset pruning and training are already well understood.
What is needed is better search and reasoning logic that doesn't take a small city sized server farm to use that data.
So steal the algorithm, use the datasets you already have. Forget the Deepseek app. That is where I see the value for Western startups.
Unless the search\reasoning algorithm is crap, in which case we will learn pretty quick because we have most of the source. If that turns out ot be the case, then nevermind.