Open Thoughts - 114k

Posted on 2021-06-01 Edited on 2025-05-17 In datasets , open-thoughts Word count in article: 1.4k Reading time ≈ 1 mins.

Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles! This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.

Dataset Information

Homepage: Open Thoughts - 114k

Dataset Description

Open Thoughts - 114k is a synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles. This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.

Dataset Summary

The dataset consists of 114,000 examples, each with a question, a context, and an answer. The questions are designed to test reasoning abilities in math, science, code, and puzzles. The context is a paragraph of text that provides the necessary information to answer the question. The answer is a single word or phrase that is the correct response to the question.

Supported Tasks and Leaderboards

question-answering: The dataset can be used for question-answering tasks.

Languages

The text in the dataset is in English.

Dataset Structure

The dataset is structured as follows:

question: The question that the model must answer.
context: The context that provides the necessary information to answer the question.
answer: The correct response to the question.

Data Splits

The dataset is not split into training, validation, and test sets. It is a single dataset with 114,000 examples.

Hugging Face Dataset Card

https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k