BIG Data prediction by regression


The amount of data created, and potentially collected, every day by the interactions of individuals with their computers, GPS devices, cell phones, social media, medical devices, and other sources has been termed “big data.” Analysis of big data is especially needed in predictive modeling, which often uses a large number of observations and predictor variables to predict a binary response that represents an individual´s predicted future behavior. SAS customers want to analyze big data, particularly in the form of predictive modeling, accurately and easily. The SAS System provides a powerful framework for statistical analysis. It has extensive data manipulation capabilities to prepare for analytic and modeling work. It has reporting tools for presenting results. Regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of another variable. More precisely, if X and Y are two related variables, then linear regression analysis helps us to predict the value of Y for a given value of X or vice verse. Methods for prediction can be divided into two general groups: continuous and discrete outcomes. When the data is discrete we will refer to it as classification. Other terms are discriminant analysis, pattern recognition. When the data is continuous we will refer to it as regression. Other terms are smoothing and curve estimation. In this project we will apply regression on a continuous data set and try to predict outcomes with minimum error values.