Multiplatform command line tool to disaggregate and adjust value-added learning scores

A command line tool to disaggregate Scantron or ZipGrade pre- and post-test responses into Walstad & Wagner learning types

E-Mail for Help
Ben's Research >
View the Project on GitHub tazzben/WW

Multiplatform command line tool to disaggregate and adjust value-added learning scores

In the Spring of 2016, Walstad and Wagner released a paper suggesting that the pre-test/post-test delta is insufficient in assessing learning outcomes. However, performing such a disaggregation is time intensive, especially if the questions appear in a different location (or order) on the pre- and post-test.

WW_out is a command line tool that makes this disaggregation easy. It uses four CSV files to generate outcomes by question and student. The first file specifies the mapping between the assessment questions and their locations on the pre- and post-test. The second file is a list of student IDs to use in the assessment. Finally, the last two files are either Scantron files as routinely received from the testing center or ZipGrade formatted exam output files (used the 'standard' export option).

Software Builds

To make the use of this software easy, I've provided two pre-built versions of the software: one for Windows and one for Mac:

  1. Windows
  2. Mac

Additionally, I've provided four example CSV files here.

Installing using MacPorts

As an alternative to the software builds above, an user of MacPorts can install the command line tool by typing the following into the terminal:

sudo port -v selfupdate

sudo port install WW

However, as MacPorts itself can be time consuming to install, an user should likely use the pre-built version above unless they already have MacPorts installed.

Usage Instructions

Using the sample data files, we can walk through the process of running the command line software. The zip file contains four data files. Exams 1 and 2 are the Scantron raw format. For the purposes of the fake files, I've made up some students with responses. Given the data is fake, the results don't make a terrible amount of sense, but we need a set of files that can be publicly shared. Students.csv is simply a list of student IDs to use in the assessment process. If you omit this file, the program will analyze the subset of students that exist in both the pre- and post-test. However, when specified, this feature allows you to analyze a subset of the students such as majors.

Finally, assessment_questions.csv contains the question numbers for each of the questions you wish to analyze. The first column is the assessment question ID. The column Exam1 specifies which number the question appeared on the first exam (or pre-test); Exam2 specifies where it appeared on the second exam. This way, the question positions on each exam don't have to match. Finally, the column 'options' specifies the number of available answers on each question, i.e. '4' on a four option multiple choice exam (this column can contain a non-integer value). The option column can also take a value specified between 0-1 in which case it assumes the specified value is a probability. When this file is unspecified, the program assumes the pre- and post-test questions are in the same order, all matched questions should be analyzed, and the number of options equals four.

Scantron Format

The software assumes Scantron formatted exam files have the following characteristics:

  1. The first row is the answer key
  2. The second column is the student ID
  3. The third column of the first row specifies the number of questions
  4. The fourth column and higher are the student answers

The software will compare the answer key (first row) to each student's answer thereby determining if the student correctly answered the question.

ZipGrade Format

If the exam files are ZipGrade formatted, the software assumes the files have the following characteristics:

  1. The first row of the file specifies the column names
  2. There is a column named 'id', 'student id', 'external id', or 'zipgrade id' specifying a unique ID number for each student
  3. Each exam question column is named 'Q' followed by the question number (e.g. 'Q67' for question 67)
  4. In each question column cell, '0' indicates the student got the question wrong while '1' indicates they got it correct. No question column contains any other values other than 0s and 1s.

Preparing to run the Program

WW_out is a command line program that can be run from the command prompt of Windows or the terminal of Mac OS X. The first step to running the program is to place both the executable file for your platform and the data files in the same directory (folder). For instance, let's assume you've created a directory named "WW" on your desktop. Using the extraction tool of your choice, you would copy the file "WW_out" and all assessment files to that folder. With that completed, you should start your terminal program. In Windows you can start the command prompt by typing "CMD" into the Windows search box. On the Mac, you can type "Terminal" into Spotlight Search.

Running the Program

To run the program, you first must navigate to the directory containing both the program and your assessment files. Suppose you have a directory on your desktop named "WW" with both the program and assessment files, you could navigate to that folder using the following commands:



cd Desktop\WW


cd ~

cd Desktop/WW

To run the program, from the terminal or command line window, type the following:


ww_out -a "assessment_questions.csv" -p "exam1.csv" -f "exam2.csv" -s "students.csv"


./ww_out -a "assessment_questions.csv" -p "exam1.csv" -f "exam2.csv" -s "students.csv"

"-a" specifies the file containing the mapping between the assessment questions and the two exams. "-p" specifies the CSV file containing the pre-test in Scantron or ZipGrade format. "-f" indicates the file with the post-test in Scantron or ZipGrade format. Finally, "-s" indicates the file with the list of student IDs. You can retrieve a full list of command line options by typing "ww_out --help" on Windows or "./ww_out --help" on the Mac.

By default, the program assumes the filenames for each option are as indicated above. Therefore, if you are uncomfortable working in the command prompt, you can simply name the pre-test "exam1.csv", post-test "exam2.csv", student list "students.csv" and question mapping "assessment_questions.csv". Then place all of your files and the ww_out program in the same folder and double click on the ww_out executable. This technique only works on Windows. However, Mac users can navigate in the terminal to the proper directory and type "./ww_out" (without specifying options) to achieve the same effect.

Running this command will result in five files:

  1. Walstad_Wagner_types.csv - The disaggregated outcomes by assessment question number
  2. Walstad_Wagner_types_by_student.csv - The disaggregated outcomes by student ID
  3. Walstad_Wagner_types_by_student_group.csv - The disaggregated outcomes by student ID grouped by number of options (probability) on each question
  4. Questions_output.csv - Outcomes by assessment question on each exam
  5. Student_output.csv - Individual student performance on assessment questions on each exam

In all Walstad Wagner files, you will find the raw disaggregated learning types as well as columns labeled 'gamma', 'alpha', 'mu', and 'flow'. These correspond to "corrected" measurements of the learning types when factoring in the number of students guessing. γ (gamma) is corrected positive learning, α (alpha) is corrected negative learning, μ (mu) is corrected pre-test stock knowledge (corrected retained plus corrected negative learning), and flow is the corrected pre-test/post-test delta (γ-α). Formally, the following equations are used to find the corrected values:

where pl (positive learning), rl (retained learning), zl (zero learning), and nl (negative learning) refer to the raw learning type values and n is the number of answer options. It is important to use these corrected values as the raw scores can be sensitive to the percent of the class guessing. A paper detailing this adjustment can be found on SSRN.

Most files are self explanatory, however, the difference between the two student level Walstad Wagner files can be confusing. In the file 'Walstad_Wagner_types_by_student_group.csv', the corrected values are calculated at the student level where each 'Options' value is the same (e.g. 4). These results are then presented along with the number of observations per options group. For instance, suppose there is a 10 question multiple choice paired pre- post-test where half of the questions have four options and the other half have five. This file would produce two rows per student, one for the four option questions and one for the five option questions. 'Walstad_Wagner_types_by_student.csv' is a weighted average of the student group file. Once the learning types are calculated per option group, a weighted average of each learning type is calculated using the observations per option group as the weights. So, in the example above, the two rows per student would receive equal weight as there are five questions each. This results in a single row per student.

Optional feature: In the "assessment_questions" file, it is possible to have a column named "group" where each question is assigned to one or more integer groups (e.g. "1" or "1,3"). You can then run the analysis on a single group by specifying the command line option "group". This is useful when different questions might be grouped into SLOs, but some questions are in multiple groups.

Note on files: While all of the Walstad Wagner output files use matched pairs, the "Questions" output does not. Therefore, the pre- and post-test results may not match if you have some students (listed in the id file) that took only one of the two exams.

Note on encoding: While very rare, it is possible to encounter empty results in the output files if the program can not read the input files. In most cases this is due to an advanced character encoding. If you encounter this issue, please save your input files in a more basic character encoding.

Help: If you run in to an issue or are having difficulty using the program, please let me know. I can be reached at