**Introduction**Last year I posted an article called, What Do Predictive Analytics Consultants Do? Part 1, describing the general types of activities that we engage in. In the present article, I want to talk about the skills and tools that one should have to perform P

**redictive Analytics**. Although this is not strictly a “What we do” article, knowing the skills we possess and the tools we use will provide some insight into what we do, without talking about some algorithm that you may have never heard of.

**What's Not In This Series?**I am always at a loss in describing the skills of analytics, for there are many. I just completed a new book about analytics (available for

**FREE**—see notes) that has a different approach than

*Predictive Analytics using R*(also available for

**FREE**), though I am using material from three chapters. The new book is an operations research approach to analytics,

*Operations Research using Open-Source Tools*, covering a different set of methods, skill and tools. Combined, the two books are over 1000 pages, so perhaps you can see my dilemma. Hence, this article is going to touch the very basics.

The more technical aspects of

**Predictive Analytics**are not in this series for two reasons. First, this is intended for a

**general audience**, including Analytics program managers and managers who have no idea what Predictive Analytics is about. Second, the LinkedIn publishing platform is

**not designed for writing technical articles**about Analytics. It does not support any special formatting for showing code, equations, tables, etc. WordPress is much better for this, so you will find these kinds of articles at bicorner.com. The most recent article is

*Random Forest using Python*.

**What is Predictive Analytics?**In case you missed my previous article, this is a high-level description.

**Predictive Analytics**—sometimes used synonymously with

*predictive modeling*--

**is not synonymous with statistics,**often requiring modification of functional forms and use of ad hoc procedures, making it a part of data science to some degree. It does, however,

**encompasses a variety of statistical techniques**for modeling, i

**ncorporates machine learning**, and

**utilizes data mining**to analyze current and historical facts, making predictions about future. Beyond the statistical aspect lies

**a mathematical modeling and programming dimension**, which includes

**linear optimization and simulation**, for example. Yet analytics goes even further by defining the business case and requirements, which are not covered here. I discussed those in How to Build a Model.

**Statistical Modeling and Tools**This assumes that you already know

**the basics of parametric and some nonparametric statistics**. If you are not familiar with these terms, then you are missing a prerequisite. However, this is a gap you can fill with online courses from Coursera. Though I have never taken one, I have many colleagues who swear by them.

By

**statistical modeling**, I am referring to subject matter that would be covered beyond the material in a statistics for engineering or business course(s). Here we are concerned with

**linear regression**,

**logistic regression**,

**analysis of variance**(ANOVA),

**multivariate regression**, and

**clustering analysis**, as well as goodness of fit testing, hypotheses testing, experimental design and my friends Kolmogorov and Smirnoff.

**Mathematical Statistics**could be a plus, as it will take you into the underlying theory.

The tools one would/could use are a myriad and are often the tools our company or customer has already deployed. SAS modeling products are well-established tools of the trade. These include

*SAS Statistics*,

*SAS Enterprise Guide*,

*SAS Enterprise Modeler*, and others. IBM made its mark on the market with the purchase of

*Clementine*and its repackaging as

*IBM SPSS Modeler*. There are other commercial products like

*Tableau*. I have to mention

*Excel*here, for it is all many will have to work with. But you have to go beyond the basics and into its data tools, statistical analysis tools and perhaps its linear programming

*Solver*, plus be able to construct pivot tables, and so on.

If you want to learn

*SAS Enterprise Guide,*SAS has made that very easy to do. Anyone can use SAS EG for learning, regardless of your status (non-student, student, professor, professional, etc.) at http://www.sas.com/en_us/software/university-edition.html. But, this is restricted to personal use only.

Today, there a multitude of open source domain tools that have become popular, including

*R*and its GUI,

*R-Studio*; the

*S*programming package; and the

*Python*programming language (the most used language in 2014).

*R*, for example, is every bit as good as its nemesis

*SAS*, but I have yet to get it to leverage the enormous amount of data that I have with

*SAS*. Part of this is due to server capacity and allocation, so I really don't know how much data

*R*can handle.

**Data Processing**For the foregoing methods, data is necessary and it will probably not be handed to you on a silver platter ready for consumption. It may be “dirty”, in the wrong format, incomplete, or just not right. Since this is where you may spend an abundant amount of time, you need the skill at tools to process data. Even if this is a secondary task--it has not been for me--you will probably need to know

*Structured Query Language*(SQL) and something bout the structure of databases.

If you do not have clean, complete, and reliable data to model with, you are doomed. You may have to remove inconsistencies, impute missing values, and so on. Then you have to analyze the data, perform data reduction, and integrate the data so that it is ready for use. Modeling with “bad” data results in a “bad” model!

Databases are plentiful and come in the form of

*Oracle Exadata*,

*Teradata*

*,*

*Microsoft SQL Server Parallel Data Warehouse*,

*IBM Netezza*, and

*Vertica*

*.*The

*Greenplum Database*builds on the foundations of open source database

*PostgreSQL*

*.*Or you may need to use a data platform like

*Hadoop*

*.*Also,

*Excel*has the capacity to store "small amounts" of data across multiple worksheets and built-in data processing tools.

**Matematical Modeling**Again, there are prerequisites like

**differential and integral calculus and linear algebra**. Multivariate calculus is a plus, particularly if you'll be doing models involving differential equations and nonlinear optimization. The skills you need to acquire beyond the basics include

**mathematical programming**--linear, integer, mixed, and nonlinear.

**Goal programming**,

**game theory**,

**Markov chains**, and

**queuing theory**, to name a few, may be required. Mathematical studies in real and complex analysis, and linear vector spaces, as well as abstract algebraic concepts like group, fields and rings, can reveal the foundational theory.

**Simulation modeling**, including

**Monte Carlo**,

**discrete and continuous time**, plus

**discrete event simulation**can be applied in analytics--I have not seen this as common practice in business analytics, but it certainly has its place. These models may rely heavily upon queuing theory, Markov chains, inventory theory and network theory.

The corporate mainstay is the powerhouse combination of

*MATLAB*and

*Simulink*.

*MATrix LABoratory*or

*MATLAB*(that is why it is spelled with all caps!). Other noteworthy commercial products include

*Mathematica*and

*Analytica*.

*Octave*is an open-source mathematical modeling tool that reads MATLAB code and there are add-on GUI environments (like

*R-studio*for

*R*) floating around in hyperspace. I recently discovered the power of

*Scilab*and the world of modules (packages) that are available for this open-source gem.

For simulation,

*Simulink*works "on top of"

*MATLAB*functions/code for a variety of simulation models. I wrote the book "

*Missile Flight Simulation*

*"*, using

*MATLAB*and

*Simulink*.

*ExtendSim*is an excellent tool for discrete event simulation and the subject of my book "

*Discrete Event Simulation using ExtendSim*

*"*. In

*Scilab*, I have used

*Xcos*for discrete event simulation and

*Quapro*for linear programming. Both are featured in my next book.

There is a general analytics tool that I do not know much about yet.

*BOARD*

*,*in its newest release, boasts a predictive analytics capability. I will be speaking on predictive analytics at the

*BOARD*User Conference during April 13th-14th in San Diego. Again, I would be remiss not to mention

*Excel*, and particularly the

*Solver*add-in for mathematical programming. Another 3rd-party add-in to consider to @Risk.

**Conclusion**If you aspire to become an

**analytics consultant**or

**scientist,**you have a lot of open-source tools, free training and online tutorials at your fingertips. If you are already working in analytics, you can easily specialize in predictive analytics. If you are already working in predictive analytics, you have what you need to become an expert. All of the tools will either work with your PC's native processing power or through a virtual machine, for example, when using

*Hadoop,*or remote server.

**A few other articles on Predictive Analytics**- A 12 Step Approach to Analytical Modeling
- Predictive Modeling in Analytics
- What is
*Humalytica?* - A Dangerous Game We Play
- What is Operations Research?
- How can I be a Data Scientist?
- Why you might not want to be a Data Scientist
- Data Scientists are Dead, Long Live Data Science!
- Why you should care about Statistics
- Call Center Analytics: What's Missing?

**Notes**If you want one of my books as a PDF version, either go to my website and download what is available there or send an e-mail request to mailto:jeff@humalytica.com.