How to Automate Your Analysis with SPSS Statistics Syntax
Have you ever thought how amazing it would be to automate your statistical analysis?
What is the first thing that comes to mind when you hear the term "repetitive analysis"? Google results are inconsistent and offers no direction. I mostly use the term for statistical jobs where the analyst regularly performs the same preparation, summary and analysis, every time on new data (e.g. monthly survey reports). Recently, while preparing material for a course in SPSS Statistics Syntax, I came to recognise another, far more annoying case of "repetitive analysis". That is when you have to redo some of that data prep/summary/analysis due to issues raised only later down the line. Examples include interviewing errors, capturing errors, incomplete data and updated variable properties.
SPSS Statistics Syntax is ideally suited for my initial "definition" of repetitive analysis. You can use it to set up every single step in the entire process, from data import to the report. Once it is set up, you can execute the job from start to finish, all at once (no black boxes involved). It saves hours of clicking-here-clicking-there every month when the new data arrives.
Put simply, syntax instructs SPSS Statistics what to do, in a language it can understand.
However, it is also perfect for a once off analysis. Because even once off analyses rarely take place linearly, from data import straight to data prep straight to analysis straight to the report.
Analysts who use SPSS Statistics Syntax turned the Paste button, found in basically all procedures' dialog boxes, into their bestie. "Paste" pastes all the selections & specifications you've made in the box (run frequencies on ‘gender' and display bar chart) as a programming instruction in a syntax window. Other functions are typically performed manually and directly, e.g. editing the variable properties such as variable labels, value labels and user missing values. A little known fact is that they are also executable via syntax.
But why would you even attempt to open this programming can of worms when you can quickly type in the variable label or simply click your way to the right number of decimals? Because of "repetitive analysis". If you need to re-import your data to include a new break variable, you will need to label everything again. If you suddenly need to merge your dataset with the previous 11 months' data, you will need to re-click your way to the right number of decimals. If your client wants the neutral point for all 20 agreement scale questions to read "Neither /nor" instead of "Neutral", you will need to retype 20 value labels.
Besides, using SPSS Statistics Syntax is easier than you think.
Below I give a few examples of editing variable properties via syntax, by providing the instruction and some notes. Feel free to retype them directly into your own syntax window, replacing the variables, values and metadata with your own. But first, just a few guidelines for new users.
Tips on SPSS Statistics Syntax for new users
-It is not case sensitive, except when referring to values of string variables.
-Each instruction starts with a command, which always appears in bold, navy format (usually in capital letters, but not required).
-All commands end in ".".
-Comments are very useful for keeping track in plain language of why and what you are doing with a particular command.
-They appear in grey, and must be preceded with a "*" and closed with a ".".
-Variable names can be typed in manually. Alternatively, typos can be avoided by using the Variables button.
-To execute a particular section, simply highlight it and click the Run button.
-To use the syntax examples below, click on File -> New -> Syntax. A brand-new syntax window will open (alive with possibilities…). You can simply retype the commands.
Five examples of SPSS Statistics Syntax
Deleting variables
"DELETE VARIABLES" is the syntax command. Note the two different ways of listing the variables: either one by one, or with a "to" between the first and last item. The latter option can only be used if you want to apply the command to variables appearing consecutively in the dataset. Lastly, note the use of the two comments.
Renaming variables (pretend that you didn't just delete them…)
Note the format: "current_variable_name" = "new_variable_name".
Labelling variables
Each label must appear between single or double quotes.
Labelling values
If a set of variables all require the same value labels, they can be labelled simultaneously. Furthermore, if they are consecutive, the "to" keyword can be applied instead of listing them individually. All labels should appear between single or double quotes.
Selecting user-missing values
It's very helpful to understand the difference between system and user-missing values. User-missings are listed in brackets, after listing the variable(s) (either with "to" or individually). Separate multiple missing values with a comma.
Imagine that you have imported a 100 variable dataset into SPSS Statistics. Imagine that you have deleted 15 variables not required for analysis, renamed and labelled the remaining 85 variables, labelled all their values and allocated all their user-missing values. All in a day's work. Imagine you arrive back at work the next morning to find an email in your inbox from the client with the correct dataset, because yesterday's was the wrong one.