History has a way of favouring the bold, the nimble, and the ones who can efficiently make the best of the resources they have. The dominance of even large companies like OpenAI is not unshakable, as recently demonstrated by DeepSeek, a Chinese startup that developed its advanced AI model, for only $6 million—a fraction of the cost incurred by industry leaders. Their success underscores a powerful truth: raw computing power alone does not determine the victor in AI. Innovation, adaptability, and unconventional thinking can tip the scales in unexpected ways.
For India, this truth carries profound significance. With over 143 crore people and a vast linguistic and cultural tapestry, the opportunity is huge: Hindi, Bengali, Tamil, Telugu, Marathi, and many other languages are spoken by crores, yet AI models today predominantly cater to English. A homegrown approach is almost necessary if we are to unleash our nation’s full economic potential.
Recognizing this, India has made significant investments in AI. As part of the IndiaAI Mission, the government has allocated ₹10,370 crores to drive innovation and establish a competitive edge in the AI sector. Industry reports project that AI could generate over 27 lakh jobs in India by 2028. The data and AI sector is also expected to contribute $500 billion to the Indian economy in the next five years. However, for this potential to fully translate into reality, India must develop AI models tailored to its linguistic and economic landscape.
The Rise of Local LLMs
Building domestic AI capabilities is central to this vision. India’s Make-in-India vision extends beyond manufacturing; it encompasses technological self-reliance across fields, including AI. The IndiaAI mission exemplifies this commitment through the development of a homegrown Large Language Model (LLM) trained on Indian datasets. By ensuring AI systems understand India’s diverse languages and contexts, local LLMs can unlock new economic opportunities, empower businesses, and fuel job creation, contributing to India’s larger goal of a $5 trillion economy by 2030.
Several Indian startups have already embarked on this journey. Sarvam AI supports 11 Indian languages, including Hindi, Tamil, Bengali, Telugu, and Marathi. Then there’s Krutrim, which is focussing on developing AI with an emphasis on Indian languages and cultural context. But the journey is fraught with challenges. While testing Krutrim’s model, some inconsistencies were revealed—A question posed in Hindi returned a response that unpredictably weaved in multiple other languages. The intent is there; the execution, not quite. This is not a critique—it just shows how much work still needs to be done.The Missing Ingredient: DataBuilding an LLM is not just about algorithms; it’s about data—the lifeblood of any AI model. Unfortunately, high-quality training data in Indian languages is scarce. English dominates the internet, academic research, and digitized archives. However, attempts are being made on this digitization front.
In 2021, the Lang Library in Gujarat, one of India’s largest public libraries, over 12,000 books including century-old texts were digitized over a period of 4 years at a cost of 40 lakhs. However, these works were available for access only to 12 computers on the library premises. While there were aims to make it available online, a library official cited funds and copyright permissions as a challenge. Initiatives like this are indeed a step forward, but they underscore how India has a long way to go in digitizing its vast cultural and linguistic resources.
Indian newspapers are another vast, untapped repository of content. The New York Times has successfully digitized its entire history of publishing newspapers dating back to 1851. But no Indian newspaper—particularly regional language ones—has achieved this level of archival completeness. The rich history, cultural narratives, and linguistic evolution captured in old newspapers remain largely inaccessible to AI models, limiting their ability to understand and generate contextually relevant responses.
Outside of written content, even within a single language, variation abounds, adding to the challenge. Bengali spoken in Kolkata is vastly different from Bengali spoken in Dhaka. Marathi in Mumbai doesn’t always resemble the Marathi of rural Vidarbha. Indian AI models must do more than recognize words—they must grasp the intricate layers of dialects, idioms, and regional expressions.
Infrastructure: The Unsung HeroThe good news? India is quietly building the infrastructure necessary for this AI revolution. Data centers—once a rarity—are now booming. AWS, Microsoft, and Google are pouring billions into Indian cloud infrastructure, creating the backbone necessary to train and deploy powerful AI models. For Indian startups, this means they no longer need Silicon Valley-scale resources to compete. The playing field is leveling.
Moreover, the Government of India is actively supporting this growth by facilitating access to high-performance computing resources. According to recent reports, the government has approved the empanelment of 18,693 GPUs, with approximately 10,000 GPUs ready for immediate installation. A common compute facility is set to launch soon, providing startups and researchers with the necessary computational power to train and refine their models at subsidized prices.
AI and National Identity
As India pushes forward in AI, a deeper question emerges: Who controls the narrative? LLMs, by design, reflect the biases of their creators. DeepSeek, for instance, aligns strongly with Chinese governmental policies. When asked about Taiwan, it states unequivocally that the island belongs to China. This raises an urgent question for India: Can we build AI that safeguards national interests while remaining balanced and ethical? A truly Indian LLM must not only be linguistically competent but also responsible. It must present historical and political narratives from multiple perspectives while protecting the sovereignty of India.
The Road Ahead
Indigenizing LLMs for India is not about competing with OpenAI or DeepSeek—it’s about something far greater. It’s about ensuring that India’s voice, in all its linguistic and cultural diversity, is not an afterthought in the global AI discourse. With more proactive investments in this space and initiatives to make data digitized and accessible for LLMs to build on, India can take a strong leap forward.
The author is Bhavesh Goswami, Founder & CEO of CloudThat.
Disclaimer: The views expressed are solely of the author and ETCIO does not necessarily subscribe to it. ETCIO shall not be responsible for any damage caused to any person/organization directly or indirectly.
Rewrite this news article and keep the same structure, information and length. Only change the language used.