NESUG 2008 Foundations & Fundamentals

Guido’s Guide to PROC MEANS – A Tutorial for Beginners Using the SAS® System

. Guido, University of Rochester Medical Center, Rochester, NY

PROC MEANS is a basic procedure within BASE SAS® used primarily for answering questions about quantities (How much?, What is the average?, What is the total?, etc.) It is the procedure that I use second only to PROC FREQ in both data management and basic data analysis. PROC MEANS can also be used to conduct some basic statistical analysis. This beginning tutorial will touch upon many of the practical uses of PROC MEANS and some helpful tips to expand one’s knowledge of numeric type data and give a framework to build upon and extend your knowledge of the SAS System.

Copyright By cscodehelp代写 加微信 cscodehelp

INTRODUCTION

The first in this series, “Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using the SAS® System”, dealt with answering the Question of “How Many?”. This second guide concentrates on answering the question “How much?”.

The Version 9 SAS® Procedure Manual states, “The MEANS procedure provides data summarization tools to computer descriptive statistics across all observations and within groups of observations. For example, PROC MEANS calculates descriptive statistics based on moments, estimates quantiles, which includes the median, calculates confidence limits for the mean, identifies extreme values and performs a t-test”.

The following statements are used in PROC MEANS according to the SAS® Procedure Manual:

PROC MEANS

NESUG 2008 Foundations & Fundamentals

PROC MEANS DATA=Trial;

VAR Age; RUN;

The SAS System The MEANS Procedure

Analysis Variable : AGE

100 42.5800000 12.0169745 19.0000000 70.0000000

The output above gives us 5 simple statistics. The number of subjects is represented by N (N=100). The Minimum Age of the Subjects is represented by Minimum (Min=19) and the Maximum Age of the Subjects is represented by Maximum (Max=70). The Mean Age of the Subjects is represented by Mean (Mean=42.58) and the Standard Deviation of the Mean (Std Dev = 12.0169745). So the answer to our first question about what is the average age of the 100 subjects is 42.58 years.

Now we want to know what is the mean age of the men and the mean age of the women and so we can add a CLASS statement to our program to answer this question.

PROC FREQ DATA=Trial;

CLASS Sex;

The SAS System The MEANS Procedure

F 56 56 42.0892857 12.1464949 19.0000000 69.0000000 M 44 44 43.2045455 11.9603600 19.0000000 70.0000000

Analysis Variable : AGE

NESUG 2008 Foundations & Fundamentals

The great thing about the SAS System is there is almost always two or more ways to do the same thing and so another way to calculate the mean age of men and the mean age of women is to us a BY statement instead of a CLASS statement. The only caveat is that whenever you use a BY statement, the SAS dataset must be sorted. Let’s take a look at the syntax and output.

PROC SORT DATA=Trial OUT=TrialSorted;

BY Sex; RUN;

PROC MEANS DATA=TrialSorted;

VAR Age; RUN;

The SAS System The MEANS Procedure

Analysis Variable : AGE

56 42.0892857 12.1464949 19.0000000 69.0000000

44 43.2045455 11.9603600 19.0000000 70.0000000

Analysis Variable : AGE

Now you may be asking yourself, why not just use the CLASS statement and then you won’t have to sort the data. While that is correct, there may be times when you want to use both a CLASS statement and a BY statement depending on the problem. In the next example we will use both. In this example we will use Center as our CLASS variable and use Sex as our BY variable. Then we will repeat the analysis using only the CLASS statement.

NESUG 2008

Foundations & Fundamentals

PROC MEANS DATA=TrialSorted;

CLASS Center;

VAR Age; RUN;

Analysis Variable : AGE

The SAS System The MEANS Procedure

24 24 41.3750000 12.3914855

20 20 39.6500000 10.7472151

12 12 47.5833333 13.1249820

16 16 41.5000000 13.4956783 15 15 41.0666667 10.0247313 13 13 47.7692308 11.6415481

19.0000000 69.0000000

24.0000000 63.0000000

30.0000000 64.0000000

19.0000000 70.0000000 24.0000000 58.0000000 27.0000000 65.0000000

Analysis Variable : AGE

NESUG 2008

Foundations & Fundamentals

PROC MEANS DATA=TrialSorted;

CLASS Center Sex;

VAR Age; RUN;

The SAS System The MEANS Procedure

Analysis Variable : AGE

N SEX Obs N

M 16 16 41.5000000 13.4956783 19.0000000 70.0000000

24 24 41.3750000 12.3914855 19.0000000 69.0000000

M 15 15 41.0666667 10.0247313 24.0000000 58.0000000

20 20 39.6500000 10.7472151 24.0000000 63.0000000

M 13 13 47.7692308 11.6415481 27.0000000 65.0000000

12 12 47.5833333 13.1249820 30.0000000 64.0000000

While we have concisely produced the above table without sorting the data and using the CLASS statement we could still do more to make it aesthetically pleasing to the eye. So let’s decrease the decimal places to two and format the Center and Sex variables.

NESUG 2008

Foundations & Fundamentals

PROC FORMAT;

VALUE Centerf 1=’1:Austin’

2=’2:Dallas’

3=’3:Conroe’;

VALUE $Sexf ‘F’=’F:Female’

‘M’=’M:Male’;

PROC MEANS DATA=TrialSorted MAXDEC=2;

TITLE ‘Guido’’s Guide to PROC MEANS’;

TITLE2 ‘Example 6 – CLASS, FORMAT and MAXDEC’;

CLASS Center Sex;

FORMAT Center Centerf. Sex Sexf.;

Analysis Variable : AGE

Guido’s Guide to PROC MEANS Example 6 – CLASS, FORMAT and MAXDEC

The MEANS Procedure

Female 24 24 41.38 12.39 19.00

Male 16 16 41.50 13.50 19.00

Female 20 20 39.65 10.75 24.00

Male 15 15 41.07 10.02 24.00

Female 12 12 47.58 13.12 30.00 Male 13 13 47.77 11.64 27.00

69.00 70.00 63.00 58.00 64.00 65.00

NESUG 2008 Foundations & Fundamentals

Up to this point we have been letting PROC MEANS produce the “default” statistics of N, MIN, MAX, MEAN and STD DEV. (See Appendix A for available statistics from PROC MEANS)

Suppose that we want to see the MEAN, MEDIAN and the 95% Confidence Limits of the Mean. Whenever we want anything other than the default statistics we have to explicitly ask for them.

PROC MEANS DATA=TrialSorted LCLM MEAN UCLM MEDIAN MAXDEC=2;

TITLE ‘Guido’’s Guide to PROC MEANS’;

TITLE2 ‘Example 7 – Selected Statistics for Age’;

CLASS Center Sex;

FORMAT Center Centerf. Sex Sexf.;

Guido’s Guide to PROC MEANS Example 7 – Selected Statistics for Age

Analysis Variable : AGE

Lower 95% CL for Mean

Upper 95% CL for Mean

Female 24 Male 16 Female 20 Male 15 Female 12 Male 13

The MEANS Procedure

36.14 41.38 34.31 41.50 34.62 39.65 35.52 41.07 39.24 47.58 40.73 47.77

46.61 12.39 48.69 13.50 44.68 10.75 46.62 10.02 55.92 13.12 54.80 11.64

41.50 41.00 39.50 42.00 48.00 46.00

We now have a report that transmits the data very succinctly and clearly. Let’s try to do some basic statistical analyses using PROC MEANS.

NESUG 2008 Foundations & Fundamentals

If we look at the output in Example 7, then we can see that for each center there appears to be no statistically significant difference between the mean ages of the men and women. For example, in the Austin center the mean age for women is 41.38 with LCLM equal to 36.14 and UCLM equal to 46.61. The mean age for men is 41.50 with LCLM equal to 34.31 and UCLM equal to 48.69. Generally speaking, if the mean for one group is contained with the LCLM and UCLM for the other group, there is no statically significant difference in the two groups. Repeating this observation for the Dallas center, we find that there is no statistically significant difference in the mean ages of women versus men. Finally, there is also no statistically significant difference in the mean ages of the women versus the men in the Conroe center.

Now let’s try a slightly different statistical analysis and let SAS do the testing. We can consider an example from ’s book in Chapter 4. Here is a synopsis of the problem:

Mylitech is developing a new appetite suppressing compound for use in weight reduction. A preliminary study of 35 obese patients provided data before and after 10 weeks of treatment with the new compound. Does the new treatment look at all promising? Let’s take a look at the VIEWTABLE version of the SAS Dataset – Work.Obese

NESUG 2008 Foundations & Fundamentals

Notice that some subjects have a negative wtloss (this means they lost weight after the 10 weeks of treatment with the new compound). Some subjects have a positive wtloss (this means they gained weight after the 10 weeks of treatment with the new compound). If the average wtloss is not different from 0, then we conclude that there is no statistically significant difference between the beginning weight (wtpre) and the ending weight (wtloss) which is represented by the variable wtloss. PROC MEANS will test this hypothesis (referred to as “The Null Hypothesis”).

PROC MEANS DATA=Obese N MEAN STD T PRT MAXDEC=2;

TITLE ‘Guido’’s Guide to PROC MEANS’;

TITLE2 ‘Example 8 – Paired t-Test for Weight Loss’;

VAR wtloss;

Guido’s Guide to PROC MEANS Example 8 – Paired t-Test for Weight Loss

The MEANS Procedure

35 -3.46 6.34 -3.23 0.0028

Analysis Variable : wtloss

If we examine the output from Example 8 then for the 35 subjects we find that the mean difference in weight loss is -3.46 pounds, the standard deviation is 6.34, the t-value is -3.23 and the p-value is 0.0028. If the p-value is less than 0.05 then we may reject ‘The Null Hypothesis”. The p-value is 0.0028 and so we can reject “The Null Hypothesis” and conclude that there is a statistically significant difference in weight loss of the 35 subjects between pre and post treatment weights.

There are other procedures in the SAS System that can answer this question. You could use PROC UNIVARIATE which give a plethora of output, PROC SUMMARY which gives no output (by default) and since the emergence of version 7 of the SAS System you can use PROC TTEST to do the paired t-Test analysis.

We have completed our Tutorial and now the rest is up to you. The best ways to improve your SAS skills are to practice, practice, and practice. The SAS Online Help facility and SAS manuals are excellent ways to do this. Both are available to you under the Help dropdown (Learning SAS Programming and SAS Help and Documentation).

NESUG 2008 Foundations & Fundamentals

APPENDIX A – STATISTIC KEYWORDS FOR PROC MEANS STATEMENT

DESCRIPTIVE STATISTIC KEYWORDS CLM – Two sided Confidence Limit of the Mean CSS – Corrected Sum of Squares

CV – Coefficient of Variation KURTOSIS|KURT – Kurtosis

LCLM – Lower Confidence Limit of Mean MAX – Maximum

MEAN – Average

MIN – Minimum

N – Number of non-missing values

NMISS – Number of missing values

QUANTILE STATISTIC KEYWORDS MEDIAN|P50 – Median or 50th Percentile P1 – 1st Percentile

P5 – 5th Percentile

P10 – 10th Percentile

Q1|P25 – 1st Quartile or 25th Percentile

HYPOTHESIS STATISTIC KEYWORDS

PROBT – two-tailed p-value for Student’s t statistic

RANGE – Maximum minus Minimum SKEWNESS|SKEW – Skewness STDDEV|STD – Standard Deviation STDERR – Standard Error of the Mean SUM – Sum of the

SUMWGT – Sum of the Weights

UCLM – Upper Confidence Limit of Mean USS – Uncorrected Sum of Squares

VAR – Variance

Q3|P75 – 3rd Quartile or 75th Percentile P90 – 90th Percentile

P95 – 95th Percentile

P99 – 99th Percentile

QRANGE – Interquartile Range (Q3 – Q1)

T – Student’s t statistic

NESUG 2008 Foundations & Fundamentals

CONCLUSION

PROC MEANS is a very powerful but simple and necessary procedure in SAS. This Beginning Tutorial has just scratched the surface of the functionality of PROC MEANS. The author’s hope is that these several basic examples will serve as a guide for the user to extend their knowledge of PROC MEANS and experiment with other uses for their specific data needs.

REFERENCES

SAS Institute, Inc. (2002). Base SAS® 9 Procedures Guide. Cary, NC: SAS Institute, Inc.

Guido, . (2007). “Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using the SAS® System”,

Proceedings of the 20th annual North East SAS Users Group Conference, Baltimore, MD, 2007, paper #FF07.

Walker, . (2002). “Common Statistical Methods for Clinical Research with SAS® Examples”, 2nd Edition, SAS Institute: Cary, NC.

ACKNOWLEDGEMENTS

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

University of Rochester Medical Center

Department of Community and Preventive Medicine Division of Social and Behavioral Medicine

120 Corporate Woods, Suite 350

Rochester, 14623

Phone: (585) 758-7818

Fax: (585) 424-1469

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com