Original PDF Flash format debugging-101  


Debugging 101

Debugging 101
Peter Knapp
U.S. Department of Commerce
Overview
NOTE: The data set WORK.HMSALES has 4 observations and 5 variables.
NOTE: The DATA statement used 0.05 seconds.

The aim of this paper is to show a beginning user of SAS
how to debug SAS programs. New users often review their
13 RUN;
logs only for syntax errors that appear in red. They neglect
14
to look for other types of coding errors. SAS identifies non-
15 /* PRINT HOME MARKET SALES */
16

syntax errors in notes and warnings. Examples include notes
17 PROC PRINT DATA = HMSALES;
that SAS has found uninitialized variables, generated
18 TITLE "Home Market Sales Data";
missing values, or encountered more than one DATA set
19 RUN;
with repeats of BY values in a MERGE statement. In
addition, once all coding errors are cleaned up, new users
NOTE: The PROCEDURE PRINT used 0.02 seconds.
can find that their programs do not produce the desired
The output of the run looks like this:
results. To produce the desired results they need to uncover
logic errors that are often more difficult to find than coding
Home Market Sales Data
errors. By demonstrating various debugging techniques, I
plan to show that with a little practice, one can master the
OBS CONNUMH GRSUPRH QTYH PACKH SALEDTH
art of debugging SAS programs.
1 01 23 7 4 07OCT1998
2 02 17 5 2 06OCT1998
3 03 52 2 8 07OCT1998

Understanding How SAS Runs a Program
4 03 62 5 2 05OCT1998
Before discussing programming errors, having a basic
In this example, the SAS supervisor reads lines of SAS
understanding of how the SAS System runs a program is
code until it sees the key word RUN and knows it is at the
important. A SAS program is typically made up of DATA
end of a step. Because there are no syntax errors, SAS
steps and PROC steps. DATA steps create SAS data sets
compiles the DATA step, the code is executed, and the data
that read and modify data. PROC steps use SAS data sets
set HMSALES is created. It has four observations and two
and perform specific actions with the data.
variables.
The component of SAS that runs programs is called the
The supervisor then reads additional lines of code until it
SAS supervisor. It first checks the syntax of a program by
sees the second RUN statement. The PROC PRINT is
reading a step and checking the step for syntax errors. If no
written correctly so SAS compiles and executes the
errors are encountered, the supervisor compiles the step and
procedure. Finally, the job ends as there are no more lines of
runs data through the compiled code before moving on to
code to execute.
the next step and repeating the process.
Different Kind of Errors
For example to print out a list of home market sales, a
DATA step would read in the data and a PROC PRINT
As already discussed, SAS can produce syntax errors while
would print the list. The log of the program HM SALES
it compiles a step. Syntax errors relate directly to the rules
looks like this:
that govern how SAS code is written. For example, all
variable names in SAS can be no longer than eight
1 *** CREATE HOME MARKET SALES ***;
characters in length and must start with a letter or
2
underscore. Trying to create a variable HMQUANTITY
3 DATA HMSALES;
will cause a syntax error because the variable has a length
4 INPUT CONNUMH $ 1-2 GRSUPRH QTYH PACKH
5 @11 SALEDTH DATE9.;

of ten.
6 FORMAT SALEDTH DATE9.;
7 LIST;

If the submitted step is free of syntax errors, SAS then
8 CARDS;
executes the step and runs data through it. Depending on
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
what data is read, SAS may generate execution-time errors.
9 01 23 7 4 07OCT1998
Execution-time errors occur as SAS is running the step
10 02 17 5 2 06OCT1998
because of the way the code has been written. Generally
11 03 52 2 8 07OCT1998
12 03 62 5 2 05OCT1998

1

speaking the error occurs because of the way variables are
check mode. It continues to read statements, check their
defined as opposed to actual data values.
syntax, and underline any additional errors. SAS will run
data through additional steps depending on
For example, using the variable REBATEH (that has not
where the data isbeing read from and whether the step is a
been previously defined) in an equation will produce a
DATA or PROC step.
uninitialized variable message. Syntactically, there is
nothing wrong with the variable, as it follows the naming
Missing Semicolons
convention of variables. But because it has not been
previously defined, SAS assigns a value of missing to the
If in the HM SALES program the semicolon in the DATA
variable. If the variable had been explicitly defined, SAS
statement is left out, SAS produces the following log
would not define it as an uninitialized variable.
1 *** CREATE HOME MARKET SALES ***;
Another class of errors, that are actually a subset of
2
3 DATA HMSALES

execution-time errors, are invalid data errors. Invalid data
4 INPUT CONNUMH $ 1-2 GRSUPRH QTYH PACKH
errors only occur if the data running through the step causes
-
SAS to produce a execution-time error.
200
-
76

For example, if GRSUPRH is used in the calculation of
5 @11 SALEDTH DATE9.;
NETPRIH and the value of GRSUPRH is missing for some
6 FORMAT SALEDTH DATE9.;
observations, the value of NETPRIH for those observations
7 LIST;
will be set to missing. NETPRIH will be non-missing for
9 CARDS;
the other observations.
ERROR 200-322: The symbol is not recognized.
All three kinds of errors can prevent your program from
ERROR 76-322: Syntax error, statement will be ignored.
working. To make sure your program is working properly,
reviewing the log is very important. SAS logs, in addition to
NOTE: The SAS System stopped processing this step because of errors.
NOTE: The DATA statement used 0.78 seconds.

printing out the submitted code, contain three kinds of
messages: errors, notes, and warnings. Depending on the
10 RUN;
kind of error SAS finds, it will print out some combination
of the three kinds of messages. Reading all of the messages
11
in the log is important, not just the error messages displayed
12 /* PRINT HOME MARKET SALES */
13

in red.
14 PROC PRINT DATA = HMSALES;
ERROR: File WORK.HMSALES.DATA does not exist.

A fourth class of errors which can be hard to debug are
15 TITLE "Home Market Sales Data";
logic errors. Sometimes programs do not produce the
16 RUN;
desired results after all syntax and execution-time errors
NOTE: The SAS System stopped processing this step because of errors.
have been cleaned up. These errors can occur because the
NOTE: The PROCEDURE PRINT used 0.08 seconds.
program was not designed properly. It is important to
understand the data that will be used by the program and to
Without the semicolon SAS reads the INPUT statement as
account for all possible data values when designing and
part of the data statement and thinks that besides the data set
writing the program.
HMSALES the step is trying to create the data sets INPUT
and CONNUMH. The $ is not a valid data set name in SAS
For example, a program may be written to produce mailing
so the supervisor generates an error message and stops
labels using a membership database. If the database
executing the DATA step. The PROC PRINT executes, but
contains domestic and international addresses, but the
because the data set HMSALES was not created in the
program is only written with domestic addresses in mind,
DATA step, SAS generates an error message and stops
there's a good chance that labels for international members
execution of the step.
will not print properly.
Forgetting to include a semicolon is a very common
A. Syntax Errors
programming mistake. Remember that all SAS statements
end in semicolons. Be careful that a colon is not used
If SAS detects a syntax error, it usually underlines the error
instead of a semicolon.
(or where it thinks the error is), prints a number below the
underline, and prints that number along with a message at
In the HM SALES program, if the semicolon is left off of
the bottom of the step. The supervisor then enters syntax
the end of the first comment, SAS would read the DATA
2

statement as part of the comment and complain that the
must be a RUN statement. Because the RUN statement was
input statement (and every other statement in the DATA
read as part of the comment, SAS is patiently waiting for
step) is not valid or it is used out of order.
more code to run. To fix this problem, submit
1 *** CREATE HOME MARKET SALES ***
*/ RUN;
2
3 DATA HMSALES;

This will close the comment and give SAS the RUN
4 INPUT CONNUMH $ 1-2 GRSUPRH QTYH PACKH
statement it needs to finish running the program. Similarly,
-----
180

if a semicolon is left off the step proceeding the RUN
5 @11 SALEDTH DATE9.;
statement, the RUN statement will be treated as part of the
previous statement and the program will not finish. The
solution is to submit
ERROR 180-322: Statement is not valid or it is used out of proper order.
(lines 6 through 14 deleted)
; RUN;
15
16 /* PRINT HOME MARKET SALES */

Misspelled or Missing Keywords
17
18 PROC PRINT DATA = HMSALES;
ERROR: File WORK.HMSALES.DATA does not exist.

To illustrate the next syntax error, the HM SALES program
19 TITLE "Home Market Sales Data";
is modified to read an external file.
20 RUN;
1 FILENAME MYDATA 'D:\SDS\RAWDATA';
NOTE: The SAS System stopped processing this step because of errors.
2
NOTE: The PROCEDURE PRINT used 0.08 seconds.
3 *** CREATE HOME MARKET SALES ***;
4

The Program Will Not Stop
5 DATA HMSALES;
6 XFILE MYDATA;

Besides a SAS comment statement that begins with a * and
------
180

ends with a ; there are comments that begin with a /* and
7 INPUT CONNUMH $ 1-2 GRSUPRH QTYH PACKH
end with a */. If the */ is left off in the HM SALES
8 @11 SALEDTH DATE.;
program, something strange happens
9 RUN;
ERROR 180-322: Statement is not valid or it is used out of proper order.
1 *** CREATE HOME MARKET SALES ***;
ERROR: No CARDS or INFILE statement.
2
NOTE: The SAS System stopped processing this step because of errors.
3 DATA HMSALES;
WARNING: The data set WORK.HMSALES may be incomplete. When this
4 INPUT CONNUMH $ 1-2 GRSUPRH QTYH PACKH
step was stopped there were 0 observations and 5 variables.
5 @11 SALEDTH DATE9.;
NOTE: The DATA statement used 0.05 seconds.
6 FORMAT SALEDTH DATE9.;
7 LIST;
8 CARDS;

Because the keyword INFILE is misspelled as XFILE SAS
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
does not know how to interpret the statement and an error is
9 01 23 7 4 07OCT1998
generated. The INPUT statement requires a CARDS or
10 02 17 5 2 06OCT1998
11 03 52 2 8 07OCT1998

INFILE statement. As far as SAS is concerned, it cannot
12 03 62 5 2 05OCT1998
find either statement, so another error is generated.
NOTE: The data set WORK.HMSALES has 4 observations and 5 variables.
Correcting the keyword will fix the DATA step.
NOTE: The DATA statement used 1.46 seconds.
Coding Statements in the Wrong Place
13 RUN;
14
15 /* PRINT HOME MARKET SALES

Take a look at the following program:
16
17 PROC PRINT DATA = HMSALES;

1 *** CALCULATE THE AVERAGE TOTAL PRICE ***;
18 TITLE "Home Market Sales Data";
2
19 RUN;
3 PROC MEANS DATA = HMSALES;
4 TOTPRICE = QTYH * GRSUPRH;

Not only will the PROC PRINT not run (no messages are
--------
printed after the PROC PRINT and no output is produced)
180
5 VAR TOTPRICE;

but SAS continues to run though there is no more code to
ERROR: Variable TOTPRICE not found.
compile and execute. The last statement in a SAS program
6 RUN;
3

ERROR 180-322: Statement is not valid or it is used out of proper order.
is running through the program, the input data set has no
observations and the PROC FORMAT fails. This error goes
NOTE: The SAS System stopped processing this step because of errors.
away when the input data set has observations.
NOTE: The PROCEDURE MEANS used 0.19 seconds.
SAS produces an error in the log because is doesn't like the
assignment statement that calculates TOTPRICE. While the
B. Execution-Time Errors
assignment statement is syntactically correct, it is not
allowed in the PROC MEANS. Assignment statements
When SAS runs data through a compiled step, it can
belong in DATA steps. An improved program would look
encounter execution-time errors. These kinds of errors,
like this:
depending on their severity, either generate notes in the log
and allow the program to continue running, or generate
1 *** CALCULATE THE AVERAGE TOTAL PRICE ***;
error messages.
2
3 DATA HMSALES;

Uninitialized Variables
4 SET HMSALES;
5 TOTPRICE = QTYH * GRSUPRH;
6 RUN;

The following program creates a subset of 1998 Home
Market (HM) Sales:
NOTE: The data set WORK.HMSALES has 4 observations and 6 variables.
1 *** KEEP 1998 SALES ***;
NOTE: The DATA statement used 0.07 seconds.
2
3 DATA HMSALES;

7
4 SET HMSALES;
8 PROC MEANS DATA = HMSALES;
5 IF YEAR(SALEDATE) = 1998;
9 VAR TOTPRICE;
6 RUN;
10 RUN;
NOTE: Variable SALEDATE is uninitialized.
NOTE: The PROCEDURE MEANS used 0.02 seconds.
NOTE: Missing values were generated as a result of performing an
operation on missing values.
The output looks like this:
Each place is given by: (Number of times) at (Line):(Column).
4 at 160:7

Home Market Sales Data
NOTE: The data set WORK.HMSALES has 0 observations and 7 variables.
NOTE: The DATA statement used 0.15 seconds.

Analysis Variable : TOTPRICE
and generates the note about SALEDATE being
N Mean Std Dev Minimum Maximum
uninitialized. The message SAS writes to the log is a note as
------------------------------------------------------------------------------------
4 165.0000000 101.9182679 85.0000000 310.0000000
opposed to an error because the variable SALEDATE is a
-------------------------------------------------------------------------------------
legitimate name for a SAS variable. It doesn't know that the
variable is misspelled. Because SALEDATE is
Quickly Checking for Syntax Errors
uninitialized, SAS sets its value to missing for all
observations. Missing is never equal to 1998 so the data set
Because the SAS supervisor compiles and runs SAS
HMSALES has no observations output to it.
programs one step at a time, a syntax error may not be
found until the last step of the program. If large data sets are
Uninitialized variables occur for many reasons. Things to
used to test the program a system option
look for are dropping the variable from the input data set,
misspelling the variable name, using the variable before it is
OPTION OBS = 0;
created, or using the wrong data set.
can be coded at the beginning of the program. This allows
Another thing to watch for is accidentally creating an
the program to be run in syntax check mode. Every step will
uninitialized variable by assigning a non-existent variable to
be checked for syntax errors and compiled, but no data will
itself.
be run. This speeds up the debugging process by
eliminating the wait time for data to run through the
For example the following program calculates a net price by
program.
adding any packing to gross unit price.
It is worth mentioning that occasionally, setting the OBS to
1 *** CALCULATE THE NET PRICE ***;
zero will cause syntax errors to be generated. For example,
2
formats can be created using the PROC FORMAT with a
3 DATA HMSALES;
4 INFILE MYDATA;

CNTLIN option. This option tells SAS to use the data set
5 INPUT CONNUMH $ 1-2 GRSUPRH;
specified in the CNTLIN to generate the format. As no data
6 PACKH = PACKH;
4

7 RUN;NOTE: The infile MYDATA is:
be missing. To avoid this, the SUM function can be used. It
FILENAME=D:\SDS\RAWDATA,
returns the sum of all if its non-
RECFM=V,LRECL=256
missing arguments. It is not recommended to use the SUM
NOTE: 4 records were read from the infile MYDATA.
The minimum record length was 19.

function unless the cause of missing values is known and
The maximum record length was 19.
deemed appropriate. In the calculation of NETPRIH
NOTE: The data set WORK.HMSALES has 4 observations and 3 variables.
example, if the SUM function is used to eliminate the
NOTE: The DATA statement used 0.08 seconds.
missing values note, NETPRIH will be equal only to
8
9 DATA HMSALES;

GRSUPRH, which is not the desired result.
10 SET HMSALES;
11 PACKH = PACKH;

Numeric and Character Conversions
12 NETPRIH = GRSUPRH + PACKH;
13 RUN;

In the following example a U.S. sales data set is created and
NOTE: Missing values were generated as a result of performing an
an attempt is made to merge the U.S. and HM sales data
operation on missing values.
sets together.
Each place is given by: (Number of times) at (Line):(Column).
4 at 12:22

1 *** CREATE U.S. SALES ***;
NOTE: The data set WORK.HMSALES has 4 observations and 4 variables.
2
NOTE: The DATA statement used 0.07 seconds.
3 DATA USSALES;
4 INPUT CONNUMU GRSUPRU QTYU PACKU SALEDTU;

14
5 LIST;
15 PROC PRINT DATA = HMSALES;
6 CARDS;
16 TITLE "CALCULATION OF NET PRICE";
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
17 RUN;
7 1 26 3 2 19981005
8 2 72 2 7 19981006

NOTE: The PROCEDURE PRINT used 0.01 seconds.
9 3 13 6 8 19981007
10 4 0 8 4 19981007

While PACKH is not in the INPUT statement and therefore
NOTE: The data set WORK.USSALES has 4 observations and 5 variables.
does not exist (line 11), no uninitialized note is produced in
NOTE: The DATA statement used 0.05 seconds.
the first step. It is only when the program tries to use
11 RUN;
PACKH in a calculation (line 12), that SAS produces the
12
uninitialized note.
13 DATA COMBINE1;
14 MERGE USSALES (IN = INUS)

Missing Values
15 HMSALES (IN = INHM
16 RENAME = (CONNUMH = CONNUMU));
ERROR: Variable CONNUMU has been defined as both character and

In the previous example, missing values are generated in the
numeric.
DATA step that calculates NETPRIH. The note
17 BY CONNUMU;
immediately following the note that indicates that "missing
18 IF INUS AND INHM;
values were generated as a result of performing an operation
19 RUN;
on missing values" explains how many times the missing
NOTE: The SAS System stopped processing this step because of errors.
values were generated, at what line they were generated,
WARNING: The data set WORK.COMBINE1 may be incomplete. When this
and at what column they were generated.
step was stopped there were 0 observations and 8 variables.
WARNING: Data set WORK.COMBINE1 was not replaced because this step
was stopped.
In this example missing values are generated for all
NOTE: The DATA statement used 0.05 seconds.
observations in the data set. This information, coupled with
the uninitialized note, clearly points to the conclusion that a
SAS generates a syntax error because the BY variable,
variable used in the equation is somehow invalid.
CONNUMU, has been defined as both character and
numeric. The DATA steps that create both data sets are
If PACKH had originally been read in, it would not be
perfectly legitimate. But because the CONNUMs are of
uninitialized. If missing values are being generated for only
different types, the merge will not work. To fix this problem
a subset of the DATA set, a likely explanation would be that
one CONNUM's type needs to be converted. The first
there are missing values for PACKH in the input data set.
example uses a PUT function to convert numeric data to
character data. The new value is assigned to a temporary
Having missing values for some sales may be okay, but the
CONNUM and the combination of a DROP = and a
result of using missing values in an arithmetic expression is
RENAME = option in the DATA statement drops the
that the result will be set to missing. One of the rules of
original numeric CONNUMU and renames the character
SAS is that missing values propagate themselves. Every
TEMPCON as a character CONNUMU.
NETPRIH that is calculated using a missing PACKH will
5

1 * CONVERT CONNUMU TO A CHARACTER VARIABLE *;
The variable CONNUMH is a character variable and is
2
being used in an arithmetic expression so the value of
3 DATA USSALES2 (DROP = CONNUMU
CONNUMH is converted to a numeric value. GRSUPRH is
4 RENAME = (TEMPCON = CONNUMU));
5 SET USSALES;

numeric and the TRIM function is expecting a character
6 TEMPCON = PUT(CONNUMU, 2.);
argument so the value of GRSUPRH is converted to a
7 TEMPCON = '0' || (LEFT(TEMPCON));
character value, trimmed, and converted back to a numeric
8 RUN;
value that is assigned to GRSUPRH.
NOTE: The data set WORK.USSALES2 has 4 observations and 5 variables.
NOTE: The DATA statement used 0.1 seconds.

While SAS will convert values automatically, it is
recommended that the program explicitly convert values.
9
SAS runs more efficiently if it does not have to figure out
10 DATA COMBINE2;
11 MERGE USSALES2 (IN = INUS)

how to convert data values. It is also safer. Converting
12 HMSALES (IN = INHM
character data to numeric data can truncate leading zeros
13 RENAME = (CONNUMH = CONNUMU));
from the data values. Converting numeric data to character
14 BY CONNUMU;
data will not automatically pad the converted value with a
15 IF INUS AND INHM;
zero. Automatic conversion may lead to unexpected results.
16 RUN;
NOTE: The data set WORK.COMBINE2 has 4 observations and 8 variables.
C. Invalid Data Errors
NOTE: The DATA statement used 0.1 seconds.
Invalid data errors occur when the raw data SAS is trying to
Note that a '0' needs to be added to the value of
read in does not match the way SAS is trying to read the
CONNUMU so that the merge will work properly. Also,
data. In the following example, the INPUT statement is
because character data is right justified, the value of
trying to read in four numeric variables and one date
TEMPCON must be left justified before it is concatenated
variable:
to the '0'. Otherwise the concatenated value would look like
'0 1' and the third digit '1' would be truncated as
1 *** CREATE HOME MARKET SALES ***;
TEMPCON has a length of two.
2
3 DATA HMSALES;

To convert character data to numeric data, use a PUT
4 INPUT CONNUMH 1-6 GRSUPRH QTYH PACKH
5 @14 SALEDTH DATE9.;

function instead of an INPUT function.
6 FORMAT SALEDTH DATE9.;
7 LIST;

TEMPCON = PUT(CONNUMH, 8.);
8 CARDS;
The following example illustrates two other possible causes
NOTE: Invalid data for CONNUMH in line 9 1-6.
NOTE: Invalid data for PACKH in line 9 11-19.

of values being converted by SAS:
NOTE: Invalid data for SALEDTH in line 9 14-22.
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8

1 *** CALCULATE THE NET PRICE ***;
9 01 23 7 4 07OCT1998
2
CONNUMH=. GRSUPRH=7 QTYH=4 PACKH=. SALEDTH=. _ERROR_=1
3 DATA HMSALES;
_N_=1
4 INFILE MYDATA;
5 INPUT CONNUMH $ 1-2 GRSUPRH;

NOTE: The data set WORK.HMSALES has 1 observations and 5 variables.
6 TOTAL = CONNUMH + GRSUPRH;
NOTE: The DATA statement used 0.11 seconds.
7 GRSUPRH = TRIM(GRSUPRH);
8 RUN;

10 RUN;
NOTE: Character values have been converted to numeric values at the
SAS prints the contents of the input buffer in the log under a
places given by: (Line):(Column).
6:12 7:14
line labeled RULE. It also prints out the values assigned to
NOTE: Numeric values have been converted to character values at the
the variables. This information can be used to determine
places given by: (Line):(Column).
why the data was not read in properly.
7:19
NOTE: The infile MYDATA is:

In this example, CONNUMH should have been read in
FILENAME=D:\SDS\RAWDATA,
RECFM=V,LRECL=256

using columns 1-2, not columns 1-6. Columns 1-6 contain
the string '01 23 ' which is not a valid number. GRSUPRH
NOTE: 4 records were read from the infile MYDATA.
and QTYH are read in between columns 7-9, though
The minimum record length was 19.
because the real value of GRSUPRH is in columns 4-5, the
The maximum record length was 19.
NOTE: The data set WORK.HMSALES has 4 observations and 3 variables.

wrong data values are read in for GRSUPRH and
NOTE: The DATA statement used 0.08 seconds.
6

QTYH. By the time PACKH is read in, SAS is looking at
step to explicitly define the length of the character variable.
the date value. The date value is not a valid number. Finally,
Here are examples of each:
the program tells SAS to read in SALEDTH. The date value
really begins at column 11, so the reading of a partial date
LENGTH COST $ 4;
value is also invalid. This example illustrates that it is
ATTRIB COST LENGTH = $4;
important to understand the data used by the program.
Illegal Mathematical Operations
Character Field Truncation
Occasionally, certain data values will cause a SAS program
Unlike computer languages that require the explicit
to fail because it has performed an illegal mathematical
definition of variables at the top of the program, SAS
operation. The following program tries to determine what
determines the attributes of a variable by the context in
percentage of the gross unit price is made up of packing
which it is first used. This can cause truncation problems
expenses.
with character variables.
1 DATA USSALES;
2 SET USSALES;

The following program creates a variable COST that is
3 PERCNT = (GRSUPRU - PACKU) / GRSUPRU * 100;
intended to indicate if a GRSUPRH is high.
4 RUN;
NOTE: Division by zero detected at line 3 column 31.

1 *** CREATE WORD DAY ***;
CONNUMU=4 GRSUPRU=0 QTYU=8 PACKU=4 SALEDTU=19981007
2
PERCNT=. _ERROR_=1 _N_=4
3 DATA HMSALES;
NOTE: Missing values were generated as a result of performing an
4 SET HMSALES;
operation on missing values.
5 IF GRSUPRH < 50 THEN
Each place is given by: (Number of times) at (Line):(Column).
6 COST = 'LOW';
1 at 3:41
7 ELSE
NOTE: Mathematical operations could not be performed at the following
8 IF GRSUPRH >= 50 THEN
places. The results of the operations have been set to missing values.
9 COST = 'HIGH';
Each place is given by: (Number of times) at (Line):(Column).
10 RUN;
1 at 3:31
NOTE: The data set WORK.HMSALES has 4 observations and 6 variables.
NOTE: The data set WORK.USSALES has 4 observations and 6 variables.
NOTE: The DATA statement used 0.07 seconds.
NOTE: The DATA statement used 0.08 seconds.
11
SAS generates an error because GRSUPRU is equal to zero
12 PROC PRINT DATA = HMSALES;
in the fourth sale. An easy way to revise the program,
13 TITLE "Home Market Sales Cost";
14 VAR GRSUPRH COST;

assuming that zero is a valid value for GRSUPRUs, is to
15 RUN;
check the value of GRSUPRU before calculating
PERCENT.
NOTE: The PROCEDURE PRINT used 0.02 seconds.
1 DATA USSALES;
The output looks like this:
2 SET USSALES;
3 IF GRSUPRU = 0 THEN

Home Market Sales Cost
4 PERCNT = 0;
5 ELSE

OBS GRSUPRH COST
6 PERCNT = (GRSUPRU - PACKU) / GRSUPRU * 100;
1 23 LOW
8 RUN;
2 17 LOW
3 52 HIG

NOTE: The data set WORK.USSALES has 4 observations and 7 variables.
4 62 HIG
NOTE: The DATA statement used 0.07 seconds.
Note that the value of COST in the output is either 'LOW' or
If SAS generates an error indicating it has performed an
'HIG'. This is because SAS first encounters the variable
illegal mathematical operation, verify that the values
COST in the first assignment statement. The value that is
causing the illegal mathematical operation are valid. If the
being assigned to COST is character and has a length of
data is valid add condition processing to conditionally
three. Even though the second assignment statement tries to
perform the operation. If the data is invalid, correct the data.
assign a character value with a length of four, the attributes
of the variable COST have already been set.
By Group Processing
To prevent character values from being truncated, code a
If data has been sorted by a certain key variable, subsequent
LENGTH or ATTRIB statement at the top of the DATA
processing can take advantage of the fact that
7

data is organized in unique groupings. For example it
D. Logic Errors
allows two data sets to be merged together by the sorted
variable. The following program tries to print out HM sales
Sometimes after all syntax errors have been cleaned up and
by SALEDTH:
no warnings or notes indicate there is anything wrong with
the program, the results of the program are still wrong. For
1 PROC PRINT DATA = HMSALES;
example, the following program is designed to add home
2 BY SALEDTH;
market sales information to each observation in the U.S.
3 TITLE "HM Sales by Sale Date";
4 RUN;

sales data set:
ERROR: Data set WORK.HMSALES is not sorted in ascending sequence.
1 DATA MATCH;
The current by-group has SALEDTH = 07OCT1998 and the next by-group
2 MERGE USSALES2 (IN = INUS)
Has SALEDTH = 06OCT1998.
3 HMSALES (IN = INHM
NOTE: The SAS System stopped processing this step because of errors.
4 RENAME = (CONNUMH = CONNUMU));
NOTE: The PROCEDURE PRINT used 0.02 seconds.
5 BY CONNUMU;
6 RUN;

SAS cannot print out the data because it has not previously
NOTE: The data set WORK.MATCH has 5 observations and 10 variables.
been sorted by sale date. The solution is to sort the data set
NOTE: The DATA statement used 0.1 seconds.
before printing it.
Everything looks fine at a quick glance. The program is not
1 PROC SORT DATA = HMSALES OUT = HMSALES;
producing the desired results though. There are four U.S.
2 BY SALEDTH;
3 RUN;

sales input into the DATA step and five combined
observations being output. This is because there are two
NOTE: The data set WORK.HMSALES has 4 observations and 6 variables.
HM sales with a CONNUMH of '03'. SAS merges the U.S.
NOTE: The PROCEDURE SORT used 0.14 seconds.
sale with a CONNUMU of '03' to both HM sales. To rectify
4
this problem, the duplicate values in the HM data set could
5 PROC PRINT DATA = HMSALES;
be eliminated by sorting the HM data set with a
6 BY SALEDTH;
NODUPKEY option before the merge.
7 TITLE "HM Sales by Sale Date";
8 RUN;

It is worth noting the way SAS merges data sets that have
NOTE: The PROCEDURE PRINT used 0.02 seconds.
repeats of BY variables. If in the above example,
USSALES2 had three observations with a value of '03' for
The output look like this:
CONNUMU the log would look like this:
HM Sales by Sale Date
1 DATA MATCH;
2 MERGE USSALES2 (IN = INUS)

----------------- SALEDTH=05OCT1998 -------------------
3 HMSALES (IN = INHM
OBS CONNUMH GRSUPRH QTYH PACKH
4 RENAME = (CONNUMH = CONNUMU));
1 03 62 5 2
5 BY CONNUMU;
6 RUN;

------------------- SALEDTH=06OCT1998 ------------------
OBS CONNUMH GRSUPRH QTYH PACKH

NOTE: MERGE statement has more than one data set with repeats of BY
2 02 17 5 2
values.
NOTE: The data set WORK.MATCH has 8 observations and 10 variables.
------------------ SALEDTH=07OCT1998 -------------------
NOTE: The DATA statement used 0.2 seconds.
OBS CONNUMH GRSUPRH QTYH PACKH
3 01 23 7 4
When SAS encounters more than one occurrence of a BY
4 03 52 2 8
group value in both data sets, it performs a Cartesian join. It
matches every observation of the BY group in the first data
Many people assume that the data must be explicitly sorted
set with every observation of the BY group in the second
by SAS to take advantage of BY processing. If the data is
data set. While this is not an error, as with the previous
read in already sorted, BY processing will work without
example, the desired results were not produced.
needing to sort the data. It is safer to sort the data though as
it is not always possible to know ahead of time if the data is
Diagnosing Logic Errors
already sorted as it is being read in.
There are several techniques for diagnosing logic errors.
The best technique is to prevent them from happening. Key
to preventing logic errors is understanding the requirements
of the program (what it is supposed to do) and knowing
8

what the data looks like. In the above example, the program
The SAS-L listserve is a forum where people can ask
is supposed to add HM sales to U.S. sales. Instead it joins
questions or seek guidance on programming problems. The
both kinds of information together, duplicating one of the
list is very active and helpful.
U.S. sales in the process.
The papers and books listed in the references also provide
The incorrect assumption made while writing the program
guidance in learning how to debug SAS programs.
is that there is only one sale for each CONNUM in each
market. An examination of the data would reveal that
Conclusion
repeats of CONNUMs can occur.
To debug programs successfully it is important to
The data can be examined by printing out the data set or
understand how SAS runs programs. SAS first checks a
printing out a subset of the data. The Libraries Window can
step for syntax errors. If there are no syntax errors, SAS
be used to view the data in a spreadsheet view. The SAS
compiles the step and runs data through it. Depending on
Data Set Viewer, a utility that can be downloaded from the
the way the step is written and what data is read into the
SAS Institute web page (www.sas.com) also allows for
step, execution-time errors or data errors may be generated.
viewing of data in a spreadsheet view.
Logic errors can also produce undesired results.
Once a program is written and checked for errors, if the
It is important to know what the program is trying to
desired results are not achieved, the tools can help diagnose
achieve and to understand the data the program is using.
why the program is not working properly. PROC PRINTS
Reading the log for all three kinds of messages: errors,
can be added after every step to show how the data is
notes, and warnings is crucial to confirming that a program
changing step by step. To examine how data changes within
is working properly. Do not rely on the output alone, and do
a step, PUT statements can be used to print out the value of
not assume that the program ran correctly just because there
variables as they are being processed. The DATA step
are no error messages in the log.
debugger can also be used to execute a DATA step one
statement at a time and to print out variable values after
While the task of debugging a program may be difficult at
each statement is executed.
first, the task gets easier with a little practice.
Understanding why an error has occurred will not only help
Resources for Further Study
prevent it from happening again in the future, but will also
help provide insight as to how SAS works. This can help
Many people that ask for debugging help do not take
one write better programs with fewer bugs.
advantage of the online help feature in SAS. Searching the
Index Tab in the Help Topics window finds discussions of
SAS is a registered trademark of SAS Institute, Inc. Cary, NC.
different types of errors. For example, the Accessing Files
page explains that the error
References
Error: File is in use, filename.
Dewiche, Lora D. and Susan J. Slaughter (1995). The Little
SAS Book: A Primer
. SAS Institute, Cary, NC.
indicates that “The file you are trying to access is in use by
another Windows process, such as another Windows
Howard, Neil and Linda Williams Pickle and James B.
application.”
Pearson (1996). It's not a Bug, It's A Feature!! Proceedings
of the 21st Annual SAS Users Group International

Another feature in SAS that many beginner users could
Conference, pp. 370-378.
learn much from is the online tutorial. Module 1 Lesson 3
explains how to debug programs and allows the user to
SAS Language and Procedures: Usage, Version 6, First
practice debugging skills with simple programs built into
Edition (1989). SAS Institute, Cary, NC.
the tutorial.
Virgile, Bob (1996). The Dirty Dozen: Twelve Common
The Technical Support area of the SAS Institute web site,
Programming Mistakes. Proceedings of the Northeast SAS
www.sas.com, can provide support in the form of the
Users Group '96 Conference, pp. 205-210.
database of questions and answers to common problems. A
search engine is available to help find information on
Walega, Michael A. (1997). Search Your Log Files for
specific topics.
Problems the Easy Way: Use LOGCHK.SAS. Proceedings
of the Northeast SAS Users Group '97 Conference
, pp.
343-344.
9

Contact Information
Peter Knapp 202/482-1359 (voice)
U.S. Department of Commerce 202/482-1388 (fax)
14th & Constitution Avenue, NW
Room 7866
Washington, DC 20230 Peter_Knapp@ita.doc.gov
10

Document Outline