By Ron Cody
Find error and freshen up information simply utilizing SAS!
Thoroughly up to date, Cody's information cleansing options utilizing SAS, 3rd Edition, addresses initiatives that just about each facts analyst must do - that's, ensure that facts blunders can be found and corrected. Written in Ron Cody's signature casual, instructional variety, this ebook develops and demonstrates information cleansing courses and macros so that you can use as written or alter to be able to make your task of information cleansing more straightforward, speedier, and extra efficient.
Building on either the author’s adventure won from educating a knowledge cleansing path for over 10 years, and advances in SAS, this 3rd version contains 4 new chapters, overlaying issues similar to using Perl usual expressions for checking the layout of personality values (such as zip codes or e mail addresses) and the way to standardize corporation names and addresses.
With this publication, you are going to find out how to:
- find and proper blunders in personality and numeric values
- develop programming suggestions on the topic of dates and lacking values
- deal with hugely skewed info
- develop concepts for correcting your facts error
- use integrity constraints and audit trails to avoid error from being further to a fresh facts set
Read Online or Download Cody's Data Cleaning Techniques Using SAS PDF
Best enterprise applications books
Over 50 recipes to increase procedure customization in Dynamics CRM 2011 via client-side scripting evaluate study the fundamentals of client-side customization and delve deeper into complicated subject matters at your personal speed comprehend the elemental customization version supported by means of the product. a pragmatic method of studying customization pushed by way of task-based recipes for knowing and increasing your Dynamics CRM scripting ideas.
Leverage the ability of Opsview to successfully display screen your actual, digital, and personal cloud infrastructure assessment upload plugins to home windows and Linux brokers arrange vSphere tracking effectively effortless to persist with, step by step advisor on how one can get your Opsview approach up and working intimately tracking more and more complicated IT environments can pose fairly a problem.
Agencies and governmental enterprises of all sizes are embracing a brand new new release of enterprise-scale company intelligence (BI) and knowledge warehousing (DW), and intensely frequently hire a unmarried senior-level person to function the firm BI/DW software supervisor. This publication is the fundamental consultant to the incremental and iterative build-out of a winning enterprise-scale BI/DW application made out of a number of underlying initiatives, and what the firm software supervisor needs to effectively accomplish to orchestrate the various relocating components within the quest for actual enterprise-scale enterprise intelligence and knowledge warehousing.
Spark is a framework used for writing speedy, allotted courses. Spark solves related difficulties as Hadoop MapReduce does, yet with a quick in-memory process and a fresh useful variety API. With its skill to combine with Hadoop and integrated instruments for interactive question research (Spark SQL), large-scale graph processing and research (GraphX), and real-time research (Spark Streaming), it may be interactively used to quick technique and question vast datasets.
- Neural Networks Models: An analysis
- Cloud Computing For Dummies
- Call Center Operation: Design, Operation, and Maintenance
- SAS/STAT 9.1 User's Guide
- BIRT, a field guide to reporting
- Pro Spring Integration
Additional info for Cody's Data Cleaning Techniques Using SAS
Let's modify the program by adding a DATA step, so that ID's with invalid character values are listed. ) = 'Miscoded' then put Patno= AE=; run; The "heart" of this program is the PUT function. To review, the PUT function is similar to the INPUT function. It takes the following form: character_variable = put(variable, format) where character_variable is a character variable that contains the value of the variable listed as the first argument to the function, formatted by the format listed as the second argument to the function.
Inspection of the minimum and maximum values for all three variables shows that there are probably some data errors in the PATIENTS data set. If you want a slightly prettier output, you can use PROC TABULATE to accomplish the same task. For an excellent reference on PROC TABULATE, let me suggest a book written by Lauren E. , Cary, NC, as part of the SAS Press Series. Here is the equivalent PROC TABULATE program, followed by the output. 3 tells the procedure to use this format (a field width of 7 with 3 places to the right of the decimal point) for all the output, unless otherwise specified.
You may see that most of the data values fall within a certain range and values beyond this range can be examined. This chapter develops programs based on these ideas. Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look for Outliers One of the simplest ways to check for invalid numeric values is to run either PROC MEANS or PROC UNIVARIATE. By default, PROC MEANS lists the minimum and maximum values, along with the n, mean, and standard deviation. PROC UNIVARIATE is somewhat more useful in detecting invalid values, because it provides you with a listing of the five highest and five lowest values (this number can be changed with options), along with graphical output (stem-andleaf plots and box plots).
Cody's Data Cleaning Techniques Using SAS by Ron Cody