From Linear Models to Machine Learning

Author: Norman Matloff

Publisher: CRC Press

ISBN: 1351645897

Category: Business & Economics

Page: 490

View: 1773

Statistical Regression and Classification: From Linear Models to Machine Learning takes an innovative look at the traditional statistical regression course, presenting a contemporary treatment in line with today's applications and users. The text takes a modern look at regression: * A thorough treatment of classical linear and generalized linear models, supplemented with introductory material on machine learning methods. * Since classification is the focus of many contemporary applications, the book covers this topic in detail, especially the multiclass case. * In view of the voluminous nature of many modern datasets, there is a chapter on Big Data. * Has special Mathematical and Computational Complements sections at ends of chapters, and exercises are partitioned into Data, Math and Complements problems. * Instructors can tailor coverage for specific audiences such as majors in Statistics, Computer Science, or Economics. * More than 75 examples using real data. The book treats classical regression methods in an innovative, contemporary manner. Though some statistical learning methods are introduced, the primary methodology used is linear and generalized linear parametric models, covering both the Description and Prediction goals of regression methods. The author is just as interested in Description applications of regression, such as measuring the gender wage gap in Silicon Valley, as in forecasting tomorrow's demand for bike rentals. An entire chapter is devoted to measuring such effects, including discussion of Simpson's Paradox, multiple inference, and causation issues. Similarly, there is an entire chapter of parametric model fit, making use of both residual analysis and assessment via nonparametric analysis. Norman Matloff is a professor of computer science at the University of California, Davis, and was a founder of the Statistics Department at that institution. His current research focus is on recommender systems, and applications of regression methods to small area estimation and bias reduction in observational studies. He is on the editorial boards of the Journal of Statistical Computation and the R Journal. An award-winning teacher, he is the author of The Art of R Programming and Parallel Computation in Data Science: With Examples in R, C++ and CUDA.
Read More

Linear Models in Statistics

Author: N. H. Bingham,John M. Fry

Publisher: Springer Science & Business Media

ISBN: 9781848829695

Category: Mathematics

Page: 284

View: 5820

Regression is the branch of Statistics in which a dependent variable of interest is modelled as a linear combination of one or more predictor variables, together with a random error. The subject is inherently two- or higher- dimensional, thus an understanding of Statistics in one dimension is essential. Regression: Linear Models in Statistics fills the gap between introductory statistical theory and more specialist sources of information. In doing so, it provides the reader with a number of worked examples, and exercises with full solutions. The book begins with simple linear regression (one predictor variable), and analysis of variance (ANOVA), and then further explores the area through inclusion of topics such as multiple linear regression (several predictor variables) and analysis of covariance (ANCOVA). The book concludes with special topics such as non-parametric regression and mixed models, time series, spatial processes and design of experiments. Aimed at 2nd and 3rd year undergraduates studying Statistics, Regression: Linear Models in Statistics requires a basic knowledge of (one-dimensional) Statistics, as well as Probability and standard Linear Algebra. Possible companions include John Haigh’s Probability Models, and T. S. Blyth & E.F. Robertsons’ Basic Linear Algebra and Further Linear Algebra.
Read More

A Bayesian Course with Examples in R and Stan

Author: Richard McElreath

Publisher: CRC Press

ISBN: 1315362619

Category: Mathematics

Page: 487

View: 4552

Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds readers’ knowledge of and confidence in statistical modeling. Reflecting the need for even minor programming in today’s model-based statistics, the book pushes readers to perform step-by-step calculations that are usually automated. This unique computational approach ensures that readers understand enough of the details to make reasonable choices and interpretations in their own modeling work. The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation. By using complete R code examples throughout, this book provides a practical foundation for performing statistical inference. Designed for both PhD students and seasoned professionals in the natural and social sciences, it prepares them for more advanced or specialized statistical modeling. Web Resource The book is accompanied by an R package (rethinking) that is available on the author’s website and GitHub. The two core functions (map and map2stan) of this package allow a variety of statistical models to be constructed from standard model formulas.
Read More

Linear and Nonlinear Modeling

Author: Sadanori Konishi

Publisher: CRC Press

ISBN: 1466567287

Category: Mathematics

Page: 338

View: 7297

Select the Optimal Model for Interpreting Multivariate Data Introduction to Multivariate Analysis: Linear and Nonlinear Modeling shows how multivariate analysis is widely used for extracting useful information and patterns from multivariate data and for understanding the structure of random phenomena. Along with the basic concepts of various procedures in traditional multivariate analysis, the book covers nonlinear techniques for clarifying phenomena behind observed multivariate data. It primarily focuses on regression modeling, classification and discrimination, dimension reduction, and clustering. The text thoroughly explains the concepts and derivations of the AIC, BIC, and related criteria and includes a wide range of practical examples of model selection and evaluation criteria. To estimate and evaluate models with a large number of predictor variables, the author presents regularization methods, including the L1 norm regularization that gives simultaneous model estimation and variable selection. For advanced undergraduate and graduate students in statistical science, this text provides a systematic description of both traditional and newer techniques in multivariate analysis and machine learning. It also introduces linear and nonlinear statistical modeling for researchers and practitioners in industrial and systems engineering, information science, life science, and other areas.
Read More

Author: Piotr Kokoszka,Matthew Reimherr

Publisher: CRC Press

ISBN: 1498746691

Category: Mathematics

Page: 290

View: 2890

Introduction to Functional Data Analysis provides a concise textbook introduction to the field. It explains how to analyze functional data, both at exploratory and inferential levels. It also provides a systematic and accessible exposition of the methodology and the required mathematical framework. The book can be used as textbook for a semester-long course on FDA for advanced undergraduate or MS statistics majors, as well as for MS and PhD students in other disciplines, including applied mathematics, environmental science, public health, medical research, geophysical sciences and economics. It can also be used for self-study and as a reference for researchers in those fields who wish to acquire solid understanding of FDA methodology and practical guidance for its implementation. Each chapter contains plentiful examples of relevant R code and theoretical and data analytic problems. The material of the book can be roughly divided into four parts of approximately equal length: 1) basic concepts and techniques of FDA, 2) functional regression models, 3) sparse and dependent functional data, and 4) introduction to the Hilbert space framework of FDA. The book assumes advanced undergraduate background in calculus, linear algebra, distributional probability theory, foundations of statistical inference, and some familiarity with R programming. Other required statistics background is provided in scalar settings before the related functional concepts are developed. Most chapters end with references to more advanced research for those who wish to gain a more in-depth understanding of a specific topic.
Read More

Author: Francesco Bartolucci,Alessio Farcomeni,Fulvia Pennoni

Publisher: CRC Press

ISBN: 1466583711

Category: Mathematics

Page: 252

View: 3707

Drawing on the authors’ extensive research in the analysis of categorical longitudinal data, Latent Markov Models for Longitudinal Data focuses on the formulation of latent Markov models and the practical use of these models. Numerous examples illustrate how latent Markov models are used in economics, education, sociology, and other fields. The R and MATLAB® routines used for the examples are available on the authors’ website. The book provides you with the essential background on latent variable models, particularly the latent class model. It discusses how the Markov chain model and the latent class model represent a useful paradigm for latent Markov models. The authors illustrate the assumptions of the basic version of the latent Markov model and introduce maximum likelihood estimation through the Expectation-Maximization algorithm. They also cover constrained versions of the basic latent Markov model, describe the inclusion of the individual covariates, and address the random effects and multilevel extensions of the model. After covering advanced topics, the book concludes with a discussion on Bayesian inference as an alternative to maximum likelihood inference. As longitudinal data become increasingly relevant in many fields, researchers must rely on specific statistical and econometric models tailored to their application. A complete overview of latent Markov models, this book demonstrates how to use the models in three types of analysis: transition analysis with measurement errors, analyses that consider unobserved heterogeneity, and finding clusters of units and studying the transition between the clusters.
Read More

A Practical Approach with Examples in R, SAS, and BUGS

Author: Kris Bogaerts,Arnost Komarek,Emmanuel Lesaffre

Publisher: CRC Press

ISBN: 1351643053

Category: Mathematics

Page: 584

View: 4218

Survival Analysis with Interval-Censored Data: A Practical Approach with Examples in R, SAS, and BUGS provides the reader with a practical introduction into the analysis of interval-censored survival times. Although many theoretical developments have appeared in the last fifty years, interval censoring is often ignored in practice. Many are unaware of the impact of inappropriately dealing with interval censoring. In addition, the necessary software is at times difficult to trace. This book fills in the gap between theory and practice. Features: -Provides an overview of frequentist as well as Bayesian methods. -Include a focus on practical aspects and applications. -Extensively illustrates the methods with examples using R, SAS, and BUGS. Full programs are available on a supplementary website. The authors: Kris Bogaerts is project manager at I-BioStat, KU Leuven. He received his PhD in science (statistics) at KU Leuven on the analysis of interval-censored data. He has gained expertise in a great variety of statistical topics with a focus on the design and analysis of clinical trials. Arnošt Komárek is associate professor of statistics at Charles University, Prague. His subject area of expertise covers mainly survival analysis with the emphasis on interval-censored data and classification based on longitudinal data. He is past chair of the Statistical Modelling Society?and editor of?Statistical Modelling: An International Journal. Emmanuel Lesaffre is professor of biostatistics at I-BioStat, KU Leuven. His research interests include Bayesian methods, longitudinal data analysis, statistical modelling, analysis of dental data, interval-censored data, misclassification issues, and clinical trials. He is the founding chair of the?Statistical Modelling Society, past-president of the?International Society for Clinical Biostatistics,?and fellow of?ISI?and?ASA.
Read More

Author: P. McCullagh,John A. Nelder

Publisher: CRC Press

ISBN: 9780412317606

Category: Mathematics

Page: 532

View: 9376

The success of the first edition of Generalized Linear Models led to the updated Second Edition, which continues to provide a definitive unified, treatment of methods for the analysis of diverse types of data. Today, it remains popular for its clarity, richness of content and direct relevance to agricultural, biological, health, engineering, and other applications. The authors focus on examining the way a response variable depends on a combination of explanatory variables, treatment, and classification variables. They give particular emphasis to the important case where the dependence occurs through some unknown, linear combination of the explanatory variables. The Second Edition includes topics added to the core of the first edition, including conditional and marginal likelihood methods, estimating equations, and models for dispersion effects and components of dispersion. The discussion of other topics-log-linear and related models, log odds-ratio regression models, multinomial response models, inverse linear and related models, quasi-likelihood functions, and model checking-was expanded and incorporates significant revisions. Comprehension of the material requires simply a knowledge of matrix theory and the basic ideas of probability theory, but for the most part, the book is self-contained. Therefore, with its worked examples, plentiful exercises, and topics of direct use to researchers in many disciplines, Generalized Linear Models serves as ideal text, self-study guide, and reference.
Read More

Author: Jon Wakefield

Publisher: Springer Science & Business Media

ISBN: 1441909257

Category: Mathematics

Page: 697

View: 7446

Bayesian and Frequentist Regression Methods provides a modern account of both Bayesian and frequentist methods of regression analysis. Many texts cover one or the other of the approaches, but this is the most comprehensive combination of Bayesian and frequentist methods that exists in one place. The two philosophical approaches to regression methodology are featured here as complementary techniques, with theory and data analysis providing supplementary components of the discussion. In particular, methods are illustrated using a variety of data sets. The majority of the data sets are drawn from biostatistics but the techniques are generalizable to a wide range of other disciplines.
Read More

Linear Modeling for Unbalanced Data, Second Edition

Author: Ronald Christensen

Publisher: CRC Press

ISBN: 1498774059

Category: Mathematics

Page: 610

View: 9841

Analysis of Variance, Design, and Regression: Linear Modeling for Unbalanced Data, Second Edition presents linear structures for modeling data with an emphasis on how to incorporate specific ideas (hypotheses) about the structure of the data into a linear model for the data. The book carefully analyzes small data sets by using tools that are easily scaled to big data. The tools also apply to small relevant data sets that are extracted from big data. New to the Second Edition Reorganized to focus on unbalanced data Reworked balanced analyses using methods for unbalanced data Introductions to nonparametric and lasso regression Introductions to general additive and generalized additive models Examination of homologous factors Unbalanced split plot analyses Extensions to generalized linear models R, Minitab®, and SAS code on the author’s website The text can be used in a variety of courses, including a yearlong graduate course on regression and ANOVA or a data analysis course for upper-division statistics students and graduate students from other fields. It places a strong emphasis on interpreting the range of computer output encountered when dealing with unbalanced data.
Read More

The Lasso and Generalizations

Author: Trevor Hastie,Robert Tibshirani,Martin Wainwright

Publisher: CRC Press

ISBN: 1498712177

Category: Business & Economics

Page: 367

View: 9400

Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of l1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.
Read More

Using GAMLSS in R

Author: Mikis D. Stasinopoulos,Robert A. Rigby,Gillian Z. Heller,Vlasios Voudouris,Fernanda De Bastiani

Publisher: CRC Press

ISBN: 1351980378

Category: Mathematics

Page: 571

View: 3488

This book is about learning from data using the Generalized Additive Models for Location, Scale and Shape (GAMLSS). GAMLSS extends the Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs) to accommodate large complex datasets, which are increasingly prevalent. GAMLSS allows any parametric distribution for the response variable and modelling all the parameters (location, scale and shape) of the distribution as linear or smooth functions of explanatory variables. This book provides a broad overview of GAMLSS methodology and how it is implemented in R. It includes a comprehensive collection of real data examples, integrated code, and figures to illustrate the methods, and is supplemented by a website with code, data and additional materials.
Read More

Author: Richard A. Berk

Publisher: Springer

ISBN: 3319440489

Category: Mathematics

Page: 347

View: 9315

This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. As in the first edition, a unifying theme is supervised learning that can be treated as a form of regression analysis. Key concepts and procedures are illustrated with real applications, especially those with practical implications. The material is written for upper undergraduate level and graduate students in the social and life sciences and for researchers who want to apply statistical learning procedures to scientific and policy problems. The author uses this book in a course on modern regression for the social, behavioral, and biological sciences. All of the analyses included are done in R with code routinely provided.
Read More

Author: Heping Zhang,Burton H. Singer

Publisher: Springer Science & Business Media

ISBN: 9781441968241

Category: Mathematics

Page: 262

View: 1008

Multiple complex pathways, characterized by interrelated events and c- ditions, represent routes to many illnesses, diseases, and ultimately death. Although there are substantial data and plausibility arguments suppo- ing many conditions as contributory components of pathways to illness and disease end points, we have, historically, lacked an e?ective method- ogy for identifying the structure of the full pathways. Regression methods, with strong linearity assumptions and data-basedconstraints onthe extent and order of interaction terms, have traditionally been the strategies of choice for relating outcomes to potentially complex explanatory pathways. However, nonlinear relationships among candidate explanatory variables are a generic feature that must be dealt with in any characterization of how health outcomes come about. It is noteworthy that similar challenges arise from data analyses in Economics, Finance, Engineering, etc. Thus, the purpose of this book is to demonstrate the e?ectiveness of a relatively recently developed methodology—recursive partitioning—as a response to this challenge. We also compare and contrast what is learned via rec- sive partitioning with results obtained on the same data sets using more traditional methods. This serves to highlight exactly where—and for what kinds of questions—recursive partitioning–based strategies have a decisive advantage over classical regression techniques.
Read More

Analysis and Inference beyond Models

Author: Bertrand S. Clarke,Jennifer L. Clarke

Publisher: Cambridge University Press

ISBN: 1107028280

Category: Business & Economics

Page: 652

View: 9449

A bold retooling of statistics to focus directly on predictive performance with traditional and contemporary data types and methodologies.
Read More

Author: Alan Moses

Publisher: CRC Press

ISBN: 1482258609

Category: Mathematics

Page: 280

View: 1906

Molecular biologists are performing increasingly large and complicated experiments, but often have little background in data analysis. The book is devoted to teaching the statistical and computational techniques molecular biologists need to analyze their data. It explains the big-picture concepts in data analysis using a wide variety of real-world molecular biological examples such as eQTLs, ortholog identification, motif finding, inference of population structure, protein fold prediction and many more. The book takes a pragmatic approach, focusing on techniques that are based on elegant mathematics yet are the simplest to explain to scientists with little background in computers and statistics.
Read More

Author: Kevin J. Keen

Publisher: CRC Press

ISBN: 0429632215

Category: Mathematics

Page: 590

View: 2650

Praise for the First Edition "The main strength of this book is that it provides a unified framework of graphical tools for data analysis, especially for univariate and low-dimensional multivariate data. In addition, it is clearly written in plain language and the inclusion of R code is particularly useful to assist readers’ understanding of the graphical techniques discussed in the book. ... It not only summarises graphical techniques, but it also serves as a practical reference for researchers and graduate students with an interest in data display." -Han Lin Shang,?Journal of Applied Statistics Graphics for Statistics and Data Analysis with R, Second Edition, presents the basic principles of graphical design and applies these principles to engaging examples using the graphics and lattice packages in R. It offers a wide array of modern graphical displays for data visualization and representation. Added in the second edition are coverage of the ggplot2 graphics package, material on human visualization and color rendering in R, on screen, and in print. Features Emphasizes the fundamentals of statistical graphics and best practice guidelines for producing and choosing among graphical displays in R Presents technical details on topics such as: the estimation of quantiles, nonparametric and parametric density estimation; diagnostic plots for the simple linear regression model; polynomial regression, splines, and locally weighted polynomial regression for producing a smooth curve; Trellis graphics for multivariate data Provides downloadable R code and data for figures at www.graphicsforstatistics.com Kevin J. Keen is a Professor of Mathematics and Statistics at the University of Northern British Columbia (Prince George, Canada) and an Accredited Professional StatisticianTM by the Statistical Society of Canada and the American Statistical Association.
Read More

Basic Ideas and Selected Topics

Author: Peter J. Bickel,Kjell A. Doksum

Publisher: CRC Press

ISBN: 1498722709

Category: Business & Economics

Page: 465

View: 6223

Mathematical Statistics: Basic Ideas and Selected Topics, Volume II presents important statistical concepts, methods, and tools not covered in the authors’ previous volume. This second volume focuses on inference in non- and semiparametric models. It not only reexamines the procedures introduced in the first volume from a more sophisticated point of view but also addresses new problems originating from the analysis of estimation of functions and other complex decision procedures and large-scale data analysis. The book covers asymptotic efficiency in semiparametric models from the Le Cam and Fisherian points of view as well as some finite sample size optimality criteria based on Lehmann–Scheffé theory. It develops the theory of semiparametric maximum likelihood estimation with applications to areas such as survival analysis. It also discusses methods of inference based on sieve models and asymptotic testing theory. The remainder of the book is devoted to model and variable selection, Monte Carlo methods, nonparametric curve estimation, and prediction, classification, and machine learning topics. The necessary background material is included in an appendix. Using the tools and methods developed in this textbook, students will be ready for advanced research in modern statistics. Numerous examples illustrate statistical modeling and inference concepts while end-of-chapter problems reinforce elementary concepts and introduce important new topics. As in Volume I, measure theory is not required for understanding. Check out Volume I for fundamental, classical statistical concepts leading to the material in this volume.
Read More

Author: Arthur Pewsey,Markus Neuhäuser,Graeme D Ruxton

Publisher: OUP Oxford

ISBN: 0191650765

Category: Mathematics

Page: 192

View: 8914

Circular Statistics in R provides the most comprehensive guide to the analysis of circular data in over a decade. Circular data arise in many scientific contexts whether it be angular directions such as: observed compass directions of departure of radio-collared migratory birds from a release point; bond angles measured in different molecules; wind directions at different times of year at a wind farm; direction of stress-fractures in concrete bridge supports; longitudes of earthquake epicentres or seasonal and daily activity patterns, for example: data on the times of day at which animals are caught in a camera trap, or in 911 calls in New York, or in internet traffic; variation throughout the year in measles incidence, global energy requirements, TV viewing figures or injuries to athletes. The natural way of representing such data graphically is as points located around the circumference of a circle, hence their name. Importantly, circular variables are periodic in nature and the origin, or zero point, such as the beginning of a new year, is defined arbitrarily rather than necessarily emerging naturally from the system. This book will be of value both to those new to circular data analysis as well as those more familiar with the field. For beginners, the authors start by considering the fundamental graphical and numerical summaries used to represent circular data before introducing distributions that might be used to model them. They go on to discuss basic forms of inference such as point and interval estimation, as well as formal significance tests for hypotheses that will often be of scientific interest. When discussing model fitting, the authors advocate reduced reliance on the classical von Mises distribution; showcasing distributions that are capable of modelling features such as asymmetry and varying levels of kurtosis that are often exhibited by circular data. The use of likelihood-based and computer-intensive approaches to inference and modelling are stressed throughout the book. The R programming language is used to implement the methodology, particularly its "circular" package. Also provided are over 150 new functions for techniques not already covered within R. This concise but authoritative guide is accessible to the diverse range of scientists who have circular data to analyse and want to do so as easily and as effectively as possible.
Read More

An Algorithmic Perspective, Second Edition

Author: Stephen Marsland

Publisher: CRC Press

ISBN: 1498759785

Category: Computers

Page: 457

View: 4162

A Proven, Hands-On Approach for Students without a Strong Statistical Foundation Since the best-selling first edition was published, there have been several prominent developments in the field of machine learning, including the increasing work on the statistical interpretations of machine learning algorithms. Unfortunately, computer science students without a strong statistical background often find it hard to get started in this area. Remedying this deficiency, Machine Learning: An Algorithmic Perspective, Second Edition helps students understand the algorithms of machine learning. It puts them on a path toward mastering the relevant mathematics and statistics as well as the necessary programming and experimentation. New to the Second Edition Two new chapters on deep belief networks and Gaussian processes Reorganization of the chapters to make a more natural flow of content Revision of the support vector machine material, including a simple implementation for experiments New material on random forests, the perceptron convergence theorem, accuracy methods, and conjugate gradient optimization for the multi-layer perceptron Additional discussions of the Kalman and particle filters Improved code, including better use of naming conventions in Python Suitable for both an introductory one-semester course and more advanced courses, the text strongly encourages students to practice with the code. Each chapter includes detailed examples along with further reading and problems. All of the code used to create the examples is available on the author’s website.
Read More