Solving Math Word Problems

We’ve trained a system that solves grade school math problems with nearly twice the accuracy of a fine-tuned GPT-3 model. It solves about 90% as many problems as real kids: a small sample of 9-12 year olds scored 60% on a test from our dataset, while our system scored 55% on those same problems. This is important because today's AI is still quite weak at commonsense multistep reasoning, which is easy even for grade school kids. We achieved these results by training our model to recognize its mistakes, so that it can try repeatedly until it finds a solution that works.

Read paperBrowse samplesDownload dataset

Introduction

Large language models like GPT-3 have many impressive skills, including their ability to imitate many writing styles, and their extensive factual knowledge. However, they struggle to perform tasks that require accurate multistep reasoning, like solving grade school math word problems. Although the model can mimic the cadence of correct solutions, it regularly produces critical errors in logic.

To match human performance in complex logical domains, our models must learn to recognize their mistakes and to choose their steps carefully. To that end, we train verifiers to evaluate whether or not a proposed solution is correct. To solve a new problem, we use verifiers to select the best among many proposed solutions. We collected the new GSM8K dataset to evaluate our methods, and we are releasing this dataset to facilitate research.

In the ten examples below, we show solutions generated by our new method, verification, and our baseline method, fine-tuning.

GSM8K Dataset

GSM8K consists of 8.5K high quality grade school math word problems. Each problem takes between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − × ÷) to reach the final answer. Fine-tuned state-of-the-art language models perform poorly on this dataset, primarily due to the high diversity of problems. At the same time, GSM8K solutions depend only on elementary concepts, so achieving high test performance is a tractable goal.

Solutions in GSM8K are written as natural language rather than as pure math expressions. By sticking to natural language, model-generated solutions are more readily interpretable by humans, and our methods remain relatively domain agnostic.

Training Verifiers: Models that Learn from their Mistakes

One significant challenge in mathematical reasoning is the high sensitivity to individual mistakes. Autoregressive models, which generate each solution token by token, have no mechanism to correct their own errors. Solutions that veer off-course quickly become unrecoverable, as can be seen in the examples provided.

We address this problem by training verifiers to evaluate the correctness of model-generated solutions. Verifiers are given many possible solutions, all written by the model itself, and they are trained to decide which ones, if any, are correct.

To solve a new problem at test time, we generate 100 candidate solutions and then select the solution that is ranked highest by the verifier. Verifiers benefit from this inherent optionality, as well as from the fact that verification is often a simpler task than generation.

We find that we get a strong boost in performance from verification, as long as the dataset is large enough. With datasets that are too small, we believe that the verifiers overfit by memorizing the final answers in the training set, rather than learning any more useful properties of mathematical reasoning.

On the full training set, 6B parameter verification slightly outperforms a fine-tuned 175B parameter model, giving a performance boost that is approximately equivalent to a 30x model size increase. Moreover, verification appears to scale more effectively with additional data, if we extrapolate based on current results.

Conclusion

Producing correct arguments and recognizing incorrect ones are key challenges in developing more general AI. Grade school math is an ideal testbed for these capabilities. The problems in GSM8K are conceptually simple, yet one subtle mistake is enough to derail an entire solution. Identifying and avoiding such mistakes is a crucial skill for our models to develop. By training verifiers, we teach our models to separate the good solutions from the ones that didn’t quite work out. We expect these skills to become increasingly relevant as we attempt to apply our models to more logically complex domains.


Acknowledgments

Thanks to the team at Surge AI for performing the GSM8K data collection.

Thanks to our paper co-authors: Mohammad Bavarian, Jacob Hilton, Reiichiro Nakano, and Christopher Hesse.

Thanks to those who provided feedback on this release: Dan Hendrycks, Leo Gao, Matthias Plappert, Lukasz Kaiser, Alec Radford, Giambattista Parascandolo, Harri Edwards, Yura Burda, Nick Ryder, Ilya Sutskever, Mira Murati, Sam Altman, Aris Konstantinidis, Andrew Mayne, Hannah Wong, and Steve Dowling.

Thank you to the students who volunteered to take our test!

source https://openai.com/blog/grade-school-math/

The 3 Best Free AI Courses Online Today [2021]

AI is estimated to contribute $15.7 trillion to thew world economy by 2030, according to PwC

And, in 2021 alone, Gartner projects AI augmentation will create $2.9 trillion of business value, and 6.2 billion hours of worker productivity globally.

That’s a disruptive level of impact which will be felt by everyone, no matter what job or field you’re in.

It doesn’t matter if you’re a student looking to enter a hot computer science field or a seasoned professional with no interest in coding…

You will need to understand AI at some level—either to build it or harness its power in your career.

from MAII https://ift.tt/2vTwB3Q
via IFTTT

How Search Engines Use AI

Search engines are incredible.

You pop onto your favorite one and type in a few keywords…

Then, magically the search engine has scoured the entire internet to find the most relevant result.

Except it’s not magic, of course.

from MAII https://ift.tt/2FPyKNA
via IFTTT

AI Versus Machine Learning: What’s the Difference?

What is the difference between artificial intelligence and machine learning?

Don’t worry…

You don’t need to be a rocket scientist to learn.

You just need a clear definition of artificial intelligence and machine learning, then a simple distinction between the two.

We’ve got you covered.

from MAII https://ift.tt/3aTBPOD
via IFTTT