Newsela - Data Science in Education

Today’s guest speakers at Metis Data Science were Sonia Mehta, data analyst fellow, and Dan Cogan-Drew, co-founder of Newsela.

Our guests began with an introduction of Newsela, which is and education startup launched in 2013 focused on reading learning. Their approach is to publish top news articles each day from different disciplines and translate them “vertically” down to more basic levels of english. The goal is to provide teachers with an adaptive tool for teaching students to read while providing students with rich learning material that is informative. They also provide a web platform with user interaction to allow students to annotate and comment. Articles are selected translated by an in-house editorial staff.

Sonia Mehta is data analyst who joined Newsela in August. In terms of data, Newsela tracks all kinds of information for each individual. They are able to track each student’s average reading rate, what level they choose to read at, and whether they are successfully answering the quizzes for each article.

She opened with a question regarding what challenges we faced before performing any kind of analysis. It turns out that cleaning and formatting data is a huge problem. Newsela has 24 million rows of data in their databas, and gains close to 200,000 data points a day. With that much data, questions arise about how to properly segment it data. Should they be segmented by recency? student grade? reading time? Newsela also accumulates a lot of quiz data on students. Sonia was interested in finding out which quiz questions are most easy/difficult, which subjects are most/least interesting. On the product development side, she was interested in what reading strategies can they tell teachers to help students be better readers.

Sonia gave an example for one analysis she performed by looking at typical reading time of a student. The average reading time per article for students is on the order of 10 minutes, but before she could look at overall statistics, she had to remove outliers that spent 2-3+ hours reading a single article. Only after removing outliers could she discover that students at or above grade level spent about 10% (~1min) more time reading an article. This observation remained true when cut across 80-95% percentile of readers in in their population. The next step would be to look at whether these high performing students were annotating more than the lower performing students. All of this leads into identifying good reading strategies for teachers to pass on to help improve student reading levels.

Newsela had a very creative learning platform they designed and Sonia’s presentation provided lots of insight into challenges faced in a production environment. It was an interesting look into how data science can be used to better inform teachers at the K-12 level, something I hadn’t considered before.

Written on October 21, 2014