๐ช Curating High-Quality Datasets
Last updated
Last updated
The agent will be trained using a wide range of scenarios across these diverse environments, from dynamic game worlds to more structured, task-focused settings. This comprehensive approach to dataset creation will allow our agent to learn from a rich variety of interactions, environments, and instruction-action pairs, enabling it to possibly develop generalist capabilities across multiple 3D virtual settings.
The foundation will collaborate with game studios to incorporate specialised research environments and create custom-built "simulations". Each simulated environment will learn specific skills. Like The ARC AGI benchmark, HELP is designed to measure an AI system's ability to efficiently acquire new skills outside its training data, which is considered a key aspect of general intelligence. This dataset will allow the foundation to train its own vision language models.
VLMs currently face significant limitations in fine-grained visual understanding and difficulties in reasoning about abstract concepts based on visual information. LLMs excel at reasoning tasks due to their exposure to vast amounts of diverse textual data but canโt process pictures. They can capture complex patterns in logic and argumentation.
To improve VLMs, HELP is exploring synthetic data pipelines to fix these limitations that stems from high-quality data scarcity. Video games represent the best potential sources of rich, interactive training data. The ultimate goal is to syndicate people's effort and train the fastest VLMs with enhanced spatial and temporal dynamics understanding.