Open Thoughts - 114k
Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles! This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
Dataset Information
- Homepage: Open Thoughts - 114k
Dataset Description
Open Thoughts - 114k is a synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles. This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
Dataset Summary
The dataset consists of 114,000 examples, each with a question, a context, and an answer. The questions are designed to test reasoning abilities in math, science, code, and puzzles. The context is a paragraph of text that provides the necessary information to answer the question. The answer is a single word or phrase that is the correct response to the question.
Supported Tasks and Leaderboards
question-answering
: The dataset can be used for question-answering tasks.
Languages
The text in the dataset is in English.
Dataset Structure
The dataset is structured as follows:
question
: The question that the model must answer.context
: The context that provides the necessary information to answer the question.answer
: The correct response to the question.
Data Splits
The dataset is not split into training, validation, and test sets. It is a single dataset with 114,000 examples.
Hugging Face Dataset Card
https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k