Japan Government Tests 7 Domestic LLMs for 180K Civil Servants

Japan's Digital Agency announced the selection of seven domestically developed large language models for verification within the government AI platform "Gennai," marking a significant step in Japan's AI sovereignty strategy and domestic technology ecosystem development.

Japan Government Initiates Verification of Seven Domestic LLMs: Comprehensive Deep-Dive Analysis I. Event Overview On March 6, 2026, Japan's Digital Agency officially announced the selection of seven domestically developed large language models (LLMs) for verification within the government's AI platform "Gennai" (ガバメントAI・源内). The Gennai platform serves as the core AI infrastructure for Japan's central government, providing AI services to approximately 180,000 civil servants across all 39 central government agencies. The seven selected models were chosen from 15 applicants through a rigorous evaluation process conducted between December 2, 2025 and January 31, 2026, involving document review and performance assessments. Selection criteria encompassed multiple dimensions including domestic development origin, administrative utility performance, comparative benchmarking against leading overseas LLMs, safety measures, training data legal compliance, and security requirements for operation on the Government Cloud infrastructure. This initiative represents a continuation of Japan's strategic positioning in the AI domain. In May 2025, the Digital Agency formulated guidelines for "procurement and utilization of generative AI for administrative evolution and innovation," which came into full effect on April 1, 2026. The domestic model verification project constitutes a core implementation component of these guidelines. II. Detailed Analysis of Selected Models The seven domestic LLMs selected for verification represent diverse technical approaches and institutional backing within Japan's AI ecosystem: NTT tsuzumi 2: Developed by NTT, Japan's largest telecommunications operator, tsuzumi 2 represents the latest iteration of NTT's flagship LLM series. The model excels in Japanese language understanding and generation, with deep optimization for business and administrative terminology. NTT brings decades of accumulated research expertise in natural language processing to this endeavor. CC Gov-LLM (Customer Cloud): A purpose-built LLM specifically designed for government administrative use, with particular emphasis on security and compliance features. The model targets administrative document processing, policy analysis, and citizen service scenarios. Llama-3.1-ELYZA-JP-70B (KDDI and ELYZA): Built on Meta's open-source Llama 3.1 architecture with Japanese-specialized training by ELYZA, this 70-billion parameter model demonstrates the Japanese industry's capability to construct localized models on open-source foundations. The collaboration between KDDI, Japan's second-largest telecom operator, and AI specialist ELYZA showcases a pragmatic approach to domestic model development. Sarashina2 mini (SoftBank): A lightweight LLM developed by SoftBank, Japan's third-largest telecommunications operator. Under Masayoshi Son's leadership, SoftBank has been an aggressive investor in AI, and Sarashina2 mini represents the company's substantive effort in proprietary AI model development. cotomi v3 (NEC): Developed by NEC, a long-established Japanese IT enterprise with deep roots in government information systems. The cotomi series represents NEC's core LLM product line, and its extensive experience with government IT infrastructure provides unique advantages for administrative applications. Takane 32B (Fujitsu): A 32-billion parameter model from Fujitsu, Japan's largest IT services company. Fujitsu's formidable R and D capabilities in supercomputing (Fugaku) and quantum computing provide a strong technological foundation for Takane 32B. PLaMo 2.0 Prime (Preferred Networks): Preferred Networks is one of Japan's most influential AI startups, with internationally recognized expertise in deep learning frameworks and applications. PLaMo 2.0 Prime represents the highest level of achievement by Japanese startups in the LLM domain. III. The GENIAC Program and Japan's Public-Private Collaboration Model Several of the selected models have received funding support from the GENIAC project (Generative AI Development Enhancement Program) promoted by the Ministry of Economy, Trade and Industry (METI) and NEDO (New Energy and Industrial Technology Development Organization). For example, Rakuten's AI 3.0 — while not selected for this particular verification — was adopted under GENIAC's third round of public recruitment and received partial development cost subsidies. The GENIAC program exemplifies Japan's distinctive "public-private collaboration" approach to AI development. Unlike the United States, which relies primarily on private sector investment, or China, which employs state-directed massive funding, Japan has adopted a middle path: government provides foundational support and strategic direction while enterprises handle specific R and D and commercialization. This model's advantages include concentrated resource allocation and reduced duplication; its limitations include potentially constraining the freedom and speed of innovation. IV. Verification Timeline and Process According to the Digital Agency's published plan, the domestic model verification will proceed along the following timeline: Around May 2026: Launch of large-scale empirical experiments on the Gennai platform. Around August 2026: Formal trial deployment of domestic LLM models begins. During the verification period: Assessment of model performance in conversational AI services and specialized administrative AI applications. Around January 2027: Publication of partial verification results. After April 2027: Based on verification outcomes, superior models will be formally procured as government AI systems through paid contracts. The verification scope extends well beyond basic performance benchmarking, encompassing deep evaluation of model capabilities in real administrative work scenarios including document drafting, policy analysis, data organization, and citizen inquiry response. V. AI Sovereignty in Global Context The deeper motivation behind Japan's domestic LLM verification initiative is the globally intensifying trend toward "AI sovereignty." In the current global AI market, American companies — OpenAI, Google, and Anthropic — hold overwhelmingly dominant positions. For a nation with a unique linguistic and cultural system like Japan, complete dependence on foreign AI models presents multiple risk categories: Data Security Risks: Government administrative data involves national security and citizen privacy, and relying on foreign AI models may expose this data to leakage or compelled sharing. Technology Dependency Risks: If core AI capabilities are entirely controlled by foreign companies, Japan's autonomy in critical technology domains faces severe constraints. Linguistic and Cultural Risks: Foreign models may lack precision in Japanese language understanding, cultural context comprehension, and administrative terminology compared to domestically developed alternatives. The European Union, South Korea, India, and other nations and regions are pursuing similar AI sovereignty strategies. Japan's domestic model verification aligns with international precedents including France's Mistral AI, the UAE's Falcon, and India's Sarvam, reflecting a global trend: governments worldwide are actively ensuring technological autonomy in the AI era. VI. Challenges and Forward Outlook The most significant challenge facing Japan's domestic LLMs is the performance gap with international frontier models such as OpenAI's GPT-5.4 and Google's Gemini. In terms of parameter scale, even the largest selected model — Fujitsu's Takane at 32 billion parameters — represents a fraction of the trillion-parameter models now emerging at the global frontier. How to develop products that can compete with international models in specific scenarios given limited computational resources and R and D investment remains the central question for Japan's AI industry. However, the significance of this verification extends beyond performance competition. Its greater value lies in establishing an institutionalized process for evaluating and selecting domestic AI models, providing policy support and market assurance for the sustainable development of Japan's AI industry. If verification results demonstrate that domestic models possess practical utility in specific administrative scenarios, this will significantly boost confidence in Japan's AI industry and potentially catalyze increased budget allocation for domestic AI R and D. From a longer-term perspective, Japan's approach offers a valuable reference template for other small and medium-sized economies: when unable to compete with the United States and China in AI investment scale, focusing on specific application scenarios, strengthening linguistic and cultural adaptation, and establishing public-private collaboration mechanisms can chart a differentiated AI development pathway. The outcome of this verification will serve as a critical indicator not only of Japan's domestic AI technical capabilities but also of the viability of sovereignty-oriented AI strategies more broadly. In an era where AI increasingly shapes government operations, economic competitiveness, and national security, the question of who controls the AI systems processing sensitive government data carries implications far beyond technical performance metrics. Japan's experiment in building a sovereign AI capability for government use will be closely watched by policymakers around the world as a model for responsible, security-conscious AI adoption in the public sector.