Walstad and Wagner (2016) Pretest/Posttest Disaggregation Command Line Tool

A command line tool to disaggregate pre and post test responses into Walstad & Wagner learning types

E-Mail for Help
View the Project on GitHub tazzben/WW

Walstad and Wagner (2016) Pretest/Posttest Disaggregation Command Line Tool

In the Spring of 2016, Walstad and Wagner released a paper suggesting that the pretest/posttest delta is insufficient in assessing learning outcomes. However, performing such a disaggregation is time intensive, especially if the questions appear in a different location (or order) on the pre and post test.

WW_out is a command line tool that makes this disaggregation easy. It uses four CSV files to generate outcomes by question and student. The first file specifies the mapping between the assessment questions and their locations on the pre and post test. The second file is a list of student IDs to use in the assessment. Finally, the last two files are the Scantron files as routinely received from the testing center.

Software Builds

To make the use of this software easy, I've provided two pre-built versions of the software: one for Windows and one for Mac:

  1. Windows
  2. Mac

Additionally, I've provided four example CSV files here.

Installing using MacPorts

As an alternative to the software builds above, an user of MacPorts can install the command line tool by typing the following into the terminal:

sudo port -v selfupdate

sudo port install WW

However, as MacPorts itself can be time consuming to install, an user should likely use the pre-built version above unless they already have MacPorts installed.

Usage Instructions

Using the sample data files, we can walk through the process of running the command line software. The zip file contains four data files. Exams 1 and 2 are the Scantron raw format. For the purposes of the fake files, I've made up some students with responses. Given the data is fake, the results don't make a terrible amount of sense, but we need a set of files that can be publicly shared. Students.csv is simply a list of student IDs to use in the assessment process. In many cases you might simply copy the IDs from one of the exams. However, this feature allows you to analyze a subset of the students such as majors. Finally, assessment_questions.csv contains the question numbers for each of the questions you wish to analyze. The first column is the assessment question ID. The column Exam1 specifies which number the question appeared on the first exam (or pre-test); Exam2 specifies where it appeared on the second exam. This way, the question positions on each exam don't have to match. Finally, the column options specifies the number of available answers on each question, i.e. '4' on a four option multiple choice exam.

Scantron Format

The software assumes the two exam/Scantron files have the following characteristics:

  1. The first row is the answer key
  2. The second column is the student ID
  3. The third column of the first row specifies the number of questions
  4. The fourth column and higher are the student answers

The software will compare the answer key (first row) to each student's answer thereby determining if the student correctly answered the question.

Preparing to run the Program

WW_out is a command line program that can be run from the command prompt of Windows or the terminal of Mac OS X. The first step to running the program is to place both the executable file for your platform and the data files in the same directory (folder). For instance, let's assume you've created a directory named "WW" on your desktop. Using the extraction tool of your choice, you would copy the file "WW_out" and all assessment files to that folder. With that, you should start your terminal program. In Windows you can start the command prompt by typing "CMD" into the Windows search box. On the Mac, you can type "Terminal" into Spotlight Search.

Running the Program

To run the program, you first must navigate to the directory containing both the program and your assessment files. Suppose you have a directory on your desktop named "WW" with both the program and assessment files, you could navigate to that folder using the following commands:



cd Desktop\WW


cd ~

cd Desktop/WW

To run the program, from the terminal or command line window, type the following:


ww_out -a "assessment_questions.csv" -p "exam1.csv" -f "exam2.csv" -s "students.csv"


./ww_out -a "assessment_questions.csv" -p "exam1.csv" -f "exam2.csv" -s "students.csv"

"-a" specifies the file containing the mapping between the assessment questions and the two exams. "-p" specifies the CSV file containing the pre-test in the scantron format. "-f" indicates the file with the post-test in the Scantron format. Finally, "-s" indicates the file with the list of student IDs.

By default, the script assumes the filenames for each option are as indicated above. Therefore, if you are uncomfortable working in the command prompt, you can simply name the pretest "exam1.csv", posttest "exam2.csv", student list "students.csv" and question mapping "assessment_questions.csv". Then place all of your files and the ww_out program in the same folder and double click on the ww_out script.

Running this command will result in four files:

  1. Walstad_Wagner_types.csv - The disaggregated outcomes by assessment question number
  2. Walstad_Wagner_types_by_student.csv - The disaggregated outcomes by student ID
  3. Walstad_Wagner_types_by_student_group.csv - The disaggregated outcomes by student ID grouped by number of options on each question
  4. Questions_output.csv - Outcomes by assessment question on each exam
  5. Student_output.csv - Individual student performance on assessment questions on each exam

In all Walstad Wagner files, you will find the raw disaggregated learning types as well as columns labeled 'gamma', 'alpha', 'mu', and 'flow'. These correspond to "corrected" measurements of the learning types when factoring in the number of students guessing. γ (gamma) is corrected positive learning, α (alpha) is corrected negative learning, μ (mu) is corrected pre-test stock knowledge (corrected retained plus corrected negative learning), and flow is the corrected pretest/posttest delta (γ-α). Formally, the following equations are used to find the corrected values:

where pl (positive learning), rl (retained learning), zl (zero learning), and nl (negative learning) refer to the raw learning type values and n is the number of answer options. It is important to use these corrected values as the raw scores can be sensitive to the percent of the class guessing. A paper detailing this adjustment can be found on SSRN.

Optional feature: In the "assessment_questions" file, it is possible to have a column named "group" where each question is assigned to one or more integer groups (e.g. "1" or "1,3"). You can then run the analysis on a single group by specifying the command line option "group". This is useful when different questions might be grouped into SLOs, but some questions are in multiple groups.

Note on files: While all of the Walstad Wagner output files use matched pairs, the "Questions" output does not. Therefore, the pre- and post- test results may not match if you have some students (listed in the id file) that took only one of the two exams.

Note on encoding: While very rare, it is possible to encounter empty results in the output files if the program can not read the input files. In most cases this is due to an advanced character encoding. If you encounter this issue, please save your input files in a more basic character encoding.