Some frequently asked SAS Interview questions
SAS aka Statistical and Analytical Software is one of the commonly used languages for Statistical analysis throughout the world due to its tremendous power for data mining, statistical analysis,
In this post, we will be taking you through the most frequently asked SAS interview questions.
We wont just be providing answers but will be explaining through examples as well.
3. What are the different ways by which you can create macro variables in SAS?
There are a total of 4 ways by which macro variables can be created in SAS:
%let i= 1 /*Creates the macro variable with the value 1*/
data _null_;
call symput('blogname',"Onespot Anaytics");
run;
Creates the macro variable "blogname" with the value "Onespot Analytics"
The below example drops the 3 variables age , height and weight from the dataset old_student.
Example:
data new_student(drop= age height weight);
set old_student;
run;
Another way of writing it would be:
data new_student;
set old_student;
drop age height weight;
run;
The obs option can be used to print n observations from a dataset:
Similarly the outobs option can be used to print n observations in proc sql
23. At compile time, when a SAS dataset is read, what items are created by SAS?
At compile time, the following items are created by SAS:
a) Input Buffer
b) Program Data Vector (PDV)
c) Descriptor information like the time and date when the dataset was created, the number of observation and variables in the dataset.
24. In SAS Array processing, what does the DIM function do?
DIM is the dimension function in SAS.
It returns the number of elements in the array list.
Example:
In a 1- dimensional array
array big{5} weight sex height state city;
do i=1 to dim(big);
more SAS statements;
end;
The do loop runs for 5 times as the array big contains 5 elements.
Example:
In a multi-dimensional array, the dim function returns the number of elements in a specified dimension of the multi-dimensional array.
array mult{5,10,2} mult1-mult100;
The above array creates a 3-d array with 5*10*2 elements.
DIM(multi, 1) would return 5.
DIM(multi, 2) would return 10.
DIM(multi, 1) would return 2.
SAS aka Statistical and Analytical Software is one of the commonly used languages for Statistical analysis throughout the world due to its tremendous power for data mining, statistical analysis,
In this post, we will be taking you through the most frequently asked SAS interview questions.
We wont just be providing answers but will be explaining through examples as well.
1. What is DATA _NULL_ in SAS?
Data _null_ step is one of the very commonly used datasteps in SAS.
It is primarily used to create macro variables for use later in the program.
The reason why "Data _null_" is used is because not to consume unnecessary space by creating a dataset for just creating macro variables.
Example 1:
data _null_;
length country $15.;
input country $ num_of_states ;
put country num_of_states;
datalines;
India 29
Brazil 26
America 50
;
The following output is written in the SAS log:
America 50
Example 2:
data _null_;
call symput('blogname',"Onespot Anaytics");
run;
The above data _null_ creates the macro variable blogname with the value "Onespot Analytics" - without creating any dataset.
You can check the value of the macro variable by writing:
%put &blogname.;
The output in the SAS log is:
Onespot Analytics
data _null_;
length country $15.;
input country $ num_of_states ;
put country num_of_states;
datalines;
India 29
Brazil 26
America 50
;
The following output is written in the SAS log:
India 29
Brazil 26America 50
Example 2:
data _null_;
call symput('blogname',"Onespot Anaytics");
run;
The above data _null_ creates the macro variable blogname with the value "Onespot Analytics" - without creating any dataset.
You can check the value of the macro variable by writing:
%put &blogname.;
The output in the SAS log is:
Onespot Analytics
2. What are macro variables in SAS?
Macro variables are user-generated variables or symbols that enable you to store text and use them later in the SAS program.
Small or large amounts of texts can be assigned to macro variables, and by simply referencing the macro variable , the text can be referenced.
Some facts about macro variables below:
- The value stored in a macro variable can have a maximum length of 65,534 characters.
- The length of a macro variable is determined by the text assigned to it instead of a specific length declaration.So its length varies with each value it contains.
- Macro variables contain only character data. However, the macro facility has features that enable a variable to be evaluated as a number when it contains character data that can be interpreted as a number. An example to explain this below:
%macro test(finish); /*Create a function test with finish as a parameter to it*/
%let i=1; /*Create a macro variable "i" with a value of 1*/
%do %while (&i<&finish); /*Check whether the "i" is lesser than "finish"*/ %put the value of i is &i;
%let i= %eval(&i+1); /*Increment the value of i*/
%end;
%mend test;
%test(5);
Note: In line 3 of the code above, although i is a macro variable,the "less than" function can be used on it - showing that when a macro variable can be evaluated as a number.
Note: In line 3 of the code above, although i is a macro variable,the "less than" function can be used on it - showing that when a macro variable can be evaluated as a number.
There are a total of 4 ways by which macro variables can be created in SAS:
- Using the %let statement
%let i= 1 /*Creates the macro variable with the value 1*/
- Using call symput
data _null_;
call symput('blogname',"Onespot Anaytics");
run;
Creates the macro variable "blogname" with the value "Onespot Analytics"
- Using PROC SQL insert into clause
proc sql;
select max(age) into :max_age
from student;
quit;
select max(age) into :max_age
from student;
quit;
Stores the maximum of variable age from student in the macro variable max_age.
%macro student_details(age,height);
%put The age of the student is &age. years and his height is &height. cms.;
%mend;
%student_details(17, 178);
2 macro variables age and height are passed as parameters to the macro function student_details with values 17 and 178 respectively.
- As a parameter to a macro function
%macro student_details(age,height);
%put The age of the student is &age. years and his height is &height. cms.;
%mend;
%student_details(17, 178);
2 macro variables age and height are passed as parameters to the macro function student_details with values 17 and 178 respectively.
4. In SAS, what can be the minimum length of numeric and character variable respectively?
The minimum length of a numeric variable is 2 bytes whereas the minimum length of a character variable is 1 byte.
5. How to drop variables/columns from a dataset in SAS?
The below example drops the 3 variables age , height and weight from the dataset old_student.
Example:
data new_student(drop= age height weight);
set old_student;
run;
Another way of writing it would be:
data new_student;
set old_student;
drop age height weight;
run;
6. How to print the first n number of observations from a dataset in SAS?
The obs option can be used to print n observations from a dataset:
Similarly the outobs option can be used to print n observations in proc sql
Example 1:
proc print data=student(obs=10);
run;
Prints the first 10 observations from the dataset student.
proc print data=student(firstobs= 3 obs=10);
run;
proc print data=student(obs=10);
run;
Prints the first 10 observations from the dataset student.
proc print data=student(firstobs= 3 obs=10);
run;
Prints the first 10 observations from the dataset student starting from the 3rd row.
Example 2:
proc sql outobs = 10;
select * from student;
quit;
The OUTOBS= option restricts the number of rows that are displayed, but not the rows that are read.
proc sql outobs = 10 inobs=10;
select * from student;
quit;
The INOBS= option restricts the number of rows that are read from the student dataset.
SAS Procs are inbuilt sub-routines in SAS which serve a specific purpose whereas the data step is a user-created set of steps designed to read in and manipulate data.
Examples of SAS PROCs are:
PROC SORT
PROC SQL
PROC FREQ
PROC CONTENTS
PROC PRINT
PROC MEANS
PROC SUMMARY
PROC TRANSPOSE
PROC TABULATE
PROC REPORT
PROC UNIVARIATE
PROC MULTIVARIATE
8. What is a Program Data Vector (PDV) in SAS?
A Program Data Vector is a temporary space or buffer space in the SAS memory. Thus the PDV is basically a logical area in the memory.
Let us explain PDV with an example:
data student2;
set student (keep= name age height); /*Keep the 3 variables name, age and height*/
run;
The way the dataset student is read into student2 is row by row.
That is , observations are read from student one row at a time.
Each row is first read from the dataset student and stored in the PDV before
The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist.
Along with the columns that are read from the dataset , 2 more variables are created in the PDV - _N_ and _ERROR_.
This is then output to the Student2 dataset except _N_ and _ERROR_. Student2 now has 1 row.
Next the 2 row is read from Student dataset to the PDV and then to Student2.
9. What is the difference between INTNX and INTCK functions in SAS?
INTNX and INTCK are interval functions in SAS
The INTNX function can be used to increment/decrement dates by intervals.
whereasThe INTCK function counts the number of intervals between two date values.
data _null_;
call symput("start_year",intnx( "year" , today() , 0 , "B" ));
call symput("diff_year",intck( "year" , '01JAN2013'D , '10JAN2015'D));
run;
The intnx function above tries to find the starting day of the year of today's date.
The intck function tries to find the difference in the number of years between the 2 dates.
10. What are _NUMERIC_ and _CHARACTER_ in SAS and what do they do?
data student2;
set student(keep = idnumber - numeric- weight);
run;
11. What are the different ways in which you can check whether a column in a dataset has unique values or not?
There are several methods to do this:
Lets check if the height variable has duplicate values or not.
tables height;
run;
In the SAS Output window, all values of the variable height along with the frequency is displayed.
by height;
run;
In the student_dup dataset, all duplicate values of the variable height will stored.
Example 2:
proc sql outobs = 10;
select * from student;
quit;
The OUTOBS= option restricts the number of rows that are displayed, but not the rows that are read.
proc sql outobs = 10 inobs=10;
select * from student;
quit;
The INOBS= option restricts the number of rows that are read from the student dataset.
7. What is the difference between SAS PROCs (Procedures) and the SAS DATA STEP?
Examples of SAS PROCs are:
PROC SORT
PROC SQL
PROC FREQ
PROC CONTENTS
PROC PRINT
PROC MEANS
PROC SUMMARY
PROC TRANSPOSE
PROC TABULATE
PROC REPORT
PROC UNIVARIATE
PROC MULTIVARIATE
8. What is a Program Data Vector (PDV) in SAS?
A Program Data Vector is a temporary space or buffer space in the SAS memory. Thus the PDV is basically a logical area in the memory.
Let us explain PDV with an example:
data student2;
set student (keep= name age height); /*Keep the 3 variables name, age and height*/
run;
The way the dataset student is read into student2 is row by row.
That is , observations are read from student one row at a time.
Each row is first read from the dataset student and stored in the PDV before
The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist.
Along with the columns that are read from the dataset , 2 more variables are created in the PDV - _N_ and _ERROR_.
- _N_ denotes the row number currently being read from the dataset.
- _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist.
The first row from the Student dataset is read as below into the PDV:
This is then output to the Student2 dataset except _N_ and _ERROR_. Student2 now has 1 row.
Next the 2 row is read from Student dataset to the PDV and then to Student2.
9. What is the difference between INTNX and INTCK functions in SAS?
INTNX and INTCK are interval functions in SAS
The INTNX function can be used to increment/decrement dates by intervals.
whereasThe INTCK function counts the number of intervals between two date values.
data _null_;
call symput("start_year",intnx( "year" , today() , 0 , "B" ));
call symput("diff_year",intck( "year" , '01JAN2013'D , '10JAN2015'D));
run;
The intnx function above tries to find the starting day of the year of today's date.
The intck function tries to find the difference in the number of years between the 2 dates.
10. What are _NUMERIC_ and _CHARACTER_ in SAS and what do they do?
- _NUMERIC_ specifies all the numeric variables that are present in the current DATA step.
- _CHARACTER_ specifies all the character variables present in the current DATA step.
- _ALL_ specifies all the variables that are present in the current DATA step.
Example:
Let us assume the dataset student has the following columns:
idnumber name $ age height weight birthplace $
If you want to keep only the numeric variables:
Let us assume the dataset student has the following columns:
idnumber name $ age height weight birthplace $
If you want to keep only the numeric variables:
data student2;
set student(keep = idnumber - numeric- weight);
run;
11. What are the different ways in which you can check whether a column in a dataset has unique values or not?
There are several methods to do this:
Lets check if the height variable has duplicate values or not.
- Using proc freq
tables height;
run;
In the SAS Output window, all values of the variable height along with the frequency is displayed.
- Using proc sort nodupkey & dupout
by height;
run;
In the student_dup dataset, all duplicate values of the variable height will stored.
- Using first. and last.
set student;
if first.height <> last.height then output= student_dupl;
else output= student_unique;
run;
The dataset student_dupl contains the duplicate values of the variable height.
12. How will you convert a numeric variable into a character variable and a character variable into a numeric variable?
To convert a numeric variable into a character variable, use the put function:
char_value = put( numeric_value, $10.)
To convert a character variable into a numeric variable, use the input function:
numeric_value= input(char_value, 10.)
Another way of converting a character variable into numeric variable is multiplying with 1:
numeric_value= char_value * 1;
13. What are the different data types in SAS?
SAS has only 2 datatypes - numeric and character.
A common question is - isn't date also a datatype.
Actually in SAS, dates are also stored in numeric format - this is explained in one of the later.
14. What is the difference between NODUP and NODUPKEY options in PROC SORT in SAS?
NODUP searches for duplicate values of all the variables in the dataset while NODUPKEY compares just the BY variables.
proc sort data= student nodupkey;
by height;
run;
In the above code, records having duplicate values of the variable height are deleted.
proc sort data= student nodup;
by _all_;
run;
Only those rows in which all the variables have same values will be deleted.
Note: We have used _all_ to sort by all the variables. This is required because the rows having all the variables having same values - must be next to one another.
This is because SAS can only look one record back.
15. What does the RUN statement do in SAS?
When the SAS editor encounters the "RUN" statement - it starts compiling the datastep or the proc step.
16. How does SAS internally represent character and numeric missing values?
Character missing values are represented internally as Blanks or " " and numeric missing values are represented internally as .
17. Which date is represented by the SAS date value of 1900?
01 January, 1960 is treated as the first date in SAS. That is 01 January, 1960 is treated as SAS date value 1.
Therefore, 1900 SAS date value would be 15 March, 1961.
18. What are the differences between Proc MEANS and Proc SUMMARY?
Proc SUMMARY and Proc MEANS have the same functionality in SAS and both of the procedures compute descriptive statistics.
However there are 2 differences between the procedures:
1)
Proc MEANS by default prints the output in the Output window whereas Proc SUMMARY does not.
However by using the PRINT option explicitly in Proc SUMMARY - the results can be output to the output window.
2)
The second difference is the behavior of the two procedures when the VAR statement is omitted.
Proc MEANS analyses all numeric variables and produces default statistics for these variables
(N, Mean, Standard Deviation, Minimum and Maximum) whereas Proc SUMMARY simply produces the count of observations if the VAR statement is omitted.
19. What's the difference between Keep= VAL1 - VAL3 and Keep= VAL1 -- VAL3?
Say the dataset contains the variables VAL1, VAL2, A1, VAL3
SAS compiles the statements.
SAS does not read and execute one statement at a time.
Instead what it does is wait for a RUN or QUIT statement and executes the corresponding DATA step or PROC statement. When the data step is submitted for execution, SAS checks the syntax of the SAS statements and translates them into machine code. During this phase, SAS identifies the type and length of each variable and determines whether a type conversion is necessary for each subsequent reference to a variable.
This behaviour is similar to a Compiler rather than an Interpreter.
21. Why is SAS called as self-documenting?
When a SAS dataset is created, SAS automatically creates the "descriptor" portion as well as the data portion of the dataset.
In the descriptor portion, SAS stores the details like when the dataset was created, the number of observations(rows) in the dataset, the number of variables(columns) in the dataset etc.
Due to this property of SAS to accumulate, record and store the description of the data; SAS is called as self-documenting.
To convert a numeric variable into a character variable, use the put function:
char_value = put( numeric_value, $10.)
To convert a character variable into a numeric variable, use the input function:
numeric_value= input(char_value, 10.)
Another way of converting a character variable into numeric variable is multiplying with 1:
numeric_value= char_value * 1;
13. What are the different data types in SAS?
SAS has only 2 datatypes - numeric and character.
A common question is - isn't date also a datatype.
Actually in SAS, dates are also stored in numeric format - this is explained in one of the later.
14. What is the difference between NODUP and NODUPKEY options in PROC SORT in SAS?
NODUP searches for duplicate values of all the variables in the dataset while NODUPKEY compares just the BY variables.
proc sort data= student nodupkey;
by height;
run;
In the above code, records having duplicate values of the variable height are deleted.
proc sort data= student nodup;
by _all_;
run;
Note: We have used _all_ to sort by all the variables. This is required because the rows having all the variables having same values - must be next to one another.
This is because SAS can only look one record back.
15. What does the RUN statement do in SAS?
When the SAS editor encounters the "RUN" statement - it starts compiling the datastep or the proc step.
16. How does SAS internally represent character and numeric missing values?
Character missing values are represented internally as Blanks or " " and numeric missing values are represented internally as .
17. Which date is represented by the SAS date value of 1900?
01 January, 1960 is treated as the first date in SAS. That is 01 January, 1960 is treated as SAS date value 1.
Therefore, 1900 SAS date value would be 15 March, 1961.
18. What are the differences between Proc MEANS and Proc SUMMARY?
Proc SUMMARY and Proc MEANS have the same functionality in SAS and both of the procedures compute descriptive statistics.
However there are 2 differences between the procedures:
1)
Proc MEANS by default prints the output in the Output window whereas Proc SUMMARY does not.
However by using the PRINT option explicitly in Proc SUMMARY - the results can be output to the output window.
2)
The second difference is the behavior of the two procedures when the VAR statement is omitted.
Proc MEANS analyses all numeric variables and produces default statistics for these variables
(N, Mean, Standard Deviation, Minimum and Maximum) whereas Proc SUMMARY simply produces the count of observations if the VAR statement is omitted.
19. What's the difference between Keep= VAL1 - VAL3 and Keep= VAL1 -- VAL3?
Say the dataset contains the variables VAL1, VAL2, A1, VAL3
- Using the Single Hyphen, only the consecutively numbered variables VAL1, VAL2 and VAL3 will be considered by the Keep statement.
- Using the Double Hyphen, all the the variables between VAL1 and VAL3 will be kept that is, VAL1, VAL2, A1 and VAL3.
SAS compiles the statements.
SAS does not read and execute one statement at a time.
Instead what it does is wait for a RUN or QUIT statement and executes the corresponding DATA step or PROC statement. When the data step is submitted for execution, SAS checks the syntax of the SAS statements and translates them into machine code. During this phase, SAS identifies the type and length of each variable and determines whether a type conversion is necessary for each subsequent reference to a variable.
This behaviour is similar to a Compiler rather than an Interpreter.
21. Why is SAS called as self-documenting?
When a SAS dataset is created, SAS automatically creates the "descriptor" portion as well as the data portion of the dataset.
In the descriptor portion, SAS stores the details like when the dataset was created, the number of observations(rows) in the dataset, the number of variables(columns) in the dataset etc.
Due to this property of SAS to accumulate, record and store the description of the data; SAS is called as self-documenting.
22. What is the difference between Z= a+b+c+d and
Z= sum(a,b,c,d)?
Using the + operator, if any of the variables contains a missing value, the sum is automatically set to a missing value.
Whereas the SUM function treats the missing value as 0 and computes the sum.
Example:
X= 1 + . + 5 + 7 = . (as the second value is a missing value)
whereas X= sum(1, . , 5 , 7) = 13
Z= sum(a,b,c,d)?
Using the + operator, if any of the variables contains a missing value, the sum is automatically set to a missing value.
Whereas the SUM function treats the missing value as 0 and computes the sum.
Example:
X= 1 + . + 5 + 7 = . (as the second value is a missing value)
whereas X= sum(1, . , 5 , 7) = 13
23. At compile time, when a SAS dataset is read, what items are created by SAS?
At compile time, the following items are created by SAS:
a) Input Buffer
b) Program Data Vector (PDV)
c) Descriptor information like the time and date when the dataset was created, the number of observation and variables in the dataset.
24. In SAS Array processing, what does the DIM function do?
DIM is the dimension function in SAS.
It returns the number of elements in the array list.
Example:
In a 1- dimensional array
array big{5} weight sex height state city;
do i=1 to dim(big);
more SAS statements;
end;
The do loop runs for 5 times as the array big contains 5 elements.
Example:
In a multi-dimensional array, the dim function returns the number of elements in a specified dimension of the multi-dimensional array.
array mult{5,10,2} mult1-mult100;
The above array creates a 3-d array with 5*10*2 elements.
DIM(multi, 1) would return 5.
DIM(multi, 2) would return 10.
DIM(multi, 1) would return 2.
This comment has been removed by the author.
ReplyDeleteUseful blog, thanks for taking time to share this interview questions. It is really helpful and I have bookmarked this page for my future reference.
ReplyDeleteSAS Course in Chennai | SAS Training Institutes in Chennai
Thanks Mathew
Delete