Open Thoughts - 114k

Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles! This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.

Dataset Information

Dataset Description

Open Thoughts - 114k is a synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles. This data was used to train the OpenThinker-7B model, whose results are below. The numbers reported in the table below are evaluated with our open-source tool Evalchemy.

Dataset Summary

The dataset consists of 114,000 examples, each with a question, a context, and an answer. The questions are designed to test reasoning abilities in math, science, code, and puzzles. The context is a paragraph of text that provides the necessary information to answer the question. The answer is a single word or phrase that is the correct response to the question.

Supported Tasks and Leaderboards

  • question-answering: The dataset can be used for question-answering tasks.

Languages

The text in the dataset is in English.

Dataset Structure

The dataset is structured as follows:

  • question: The question that the model must answer.

  • context: The context that provides the necessary information to answer the question.

  • answer: The correct response to the question.

Data Splits

The dataset is not split into training, validation, and test sets. It is a single dataset with 114,000 examples.

Hugging Face Dataset Card

https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k