How DeepSeek Disrupting AI with Lower Costs, Greater Transparency

0
33

The tech world has been shaken by the recent rise of a new AI-powered chatbot app DeepSeek and it is developed by a company that was founded just a year ago in 2023 by Liang Wenfeng. It quickly surpassed OpenAI’s ChatGPT and became the most downloaded free iOS app in the United States. Chip-making giant Nvidia lost $600 billion in market value in just one day and it was a new US stock market record.

The success of DeepSeek is its “large language model” (LLM) and it is capable of reasoning in ways similar to the models of OpenAI like GPT-4. Moreover, it was developed at a fraction of the cost to train and run. How did DeepSeek manage to achieve this? Approach of the company relies on a range of technical strategies that dramatically reduce the computational time and memory required for training its model named R1. DeepSeek claims that the cost to train R1 was under $6 million and the figure is far lower than the $100 million OpenAI reportedly spent to develop GPT-4.

The cost reduction could have far-reaching effects. AI models come with heavy demands on resources and especially in terms of energy usage as well as environmental impact. Training and running the models often requires enormous amounts of electricity and water.

However, we must ask whether the energy savings of DeepSeek’s approach be enough to offset the increase in overall usage as more people adopt this cheaper or more accessible AI. Only time will tell, but it is certainly encouraging to see efficiency as well as sustainability gaining importance in the AI industry.

What is also noteworthy is the transparency that DeepSeek brings to table. OpenAI’s models are often seen as “black boxes” while DeepSeek has openly shared the numerical parameters of its model as well as a technical paper describing its development.

Moreover, the model of DeepSeek is built by using a “mixture of experts” technique that has been used by other models like Mistral AI’s Mixtral 8x7B. The method divides the model into smaller units and each specializes in a particular domain as well as assigns tasks to the most qualified unit.

DeepSeek might be showing the world that you don’t need massive resources to develop high-quality AI.

The AI market in the past has been dominated by large US tech companies. The rise of companies like DeepSeek serves as a wake-up call and especially to big players like Nvidia. While the immediate impact on companies like Nvidia might seem negative, but a more affordable AI industry could lead to greater adoption of AI technology in the long run.

Smaller companies like DeepSeek are starting to play a more significant role in shaping the AI landscape. It is easy to underestimate the potential of emerging players, but the rapid success of DeepSeek is proof that innovation does not always come from the biggest players.

Q&A

Q: What makes DeepSeek’s AI model different from others like GPT-4?

A: DeepSeek’s AI model, R1, stands out because it offers similar reasoning capabilities to models like GPT-4 but at a much lower cost.

Q: How did DeepSeek manage to reduce the cost of training its AI model?

A: DeepSeek reduced the computational time and memory requirements for training by optimizing its processes. For example, their model, R1, reportedly cost under $6 million to train, compared to OpenAI’s GPT-4, which cost over $100 million. DeepSeek also leveraged a “mixture of experts” technique, where specialized smaller models are tasked with specific functions.

Q: What is the “mixture of experts” technique used by DeepSeek?

A: The “mixture of experts” technique divides the AI model into several smaller models, each with expertise in a specific domain.

Q: What impact could DeepSeek have on the dominance of US-based tech giants in AI?

A: DeepSeek’s rapid rise signals that AI innovation is not limited to US-based companies.

LEAVE A REPLY

Please enter your comment!
Please enter your name here