Datasets
Explore high-quality datasets for AI research and development
Showing 8 of 8 datasets
Nexus-Gen-Training-Dataset
by Nexus AI
Comprehensive training dataset for next-generation language models with diverse text sources
CodeGen-Multilang-Dataset
by CodeAI
Large-scale code generation dataset covering 50+ programming languages with documentation
MathX-5M
by MathAI
5 million mathematical problems and solutions for AI training with step-by-step explanations
agibot_world_beta
by AgiBot Team
Large-scale robotics dataset for world understanding and manipulation tasks with comprehensive sensor data
AudioSet-Extended-2025
by Google Research
Extended version of AudioSet with additional categories and improved annotations for audio classification
Chinese-Qwen3-235B-2507-Distill
by Alibaba
Distilled Chinese language dataset optimized for Qwen3 model training with high-quality annotations
VisionQA-Multilingual
by Vision Team
Multilingual visual question answering dataset with rich annotations and cultural diversity
kontext-bench
by Research Lab
Benchmark dataset for contextual understanding and reasoning tasks across multiple domains
Showing 8 of 25,000+ datasets
