Principal Component Analysis And Randomness Tests For Big Data Analysis

Download Principal Component Analysis And Randomness Tests For Big Data Analysis PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Principal Component Analysis And Randomness Tests For Big Data Analysis book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Principal Component Analysis and Randomness Test for Big Data Analysis

Author: Mieko Tanaka-Yamawaki
language: en
Publisher: Springer Nature
Release Date: 2023-05-23
This book presents the novel approach of analyzing large-sized rectangular-shaped numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal component analysis, randomness tests, and visualization methods, the authors' approach has the benefits of universality and simplicity of data analysis, regardless of data types, structures, or specific field of science. First, mathematical preparation is described. The RMT-PCA and the RMT-test utilize the cross-correlation matrix of time series, C = XXT, where X represents a rectangular matrix of N rows and L columns and XT represents the transverse matrix of X. Because C is symmetric, namely, C = CT, it can be converted to a diagonal matrix of eigenvalues by a similarity transformation SCS-1 = SCST using an orthogonal matrix S. When N is significantly large, the histogram of the eigenvalue distribution can be compared to the theoretical formula derived in the context of the random matrix theory (RMT, in abbreviation). Then the RMT-PCA applied to high-frequency stock prices in Japanese and American markets is dealt with. This approach proves its effectiveness in extracting "trendy" business sectors of the financial market over the prescribed time scale. In this case, X consists of N stock- prices of length L, and the correlation matrix C is an N by N square matrix, whose element at the i-th row and j-th column is the inner product of the price time series of the length L of the i-th stock and the j-th stock of the equal length L. Next, the RMT-test is applied to measure randomness of various random number generators, including algorithmically generated random numbers and physically generated random numbers. The book concludes by demonstrating two applications of the RMT-test: (1) a comparison of hash functions, and (2) stock prediction by means of randomness, including a new index of off-randomness related to market decline.
Principal Component Analysis and Randomness Tests for Big Data Analysis

This book presents the novel approach of analyzing large-sized numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal component analysis, randomness tests, and visualization methods, the authors' approach has the benefits of universality and simplicity of data analysis, regardless of data types, structures, or specific field of science. First, mathematical preparation is described. The RMT-PCA and the RMT-test utilize the cross-correlation matrix of time series, C = XXT, where X represents a rectangular matrix of N rows and L columns and XT represents the transverse matrix of X. The RMT-PCA uses N samples of time series of length L. The RMT-test uses N elements of length L by cutting a single data to N pieces. Because C is symmetric, namely, C = CT, it can be converted to a diagonal matrix of eigenvalues by a similarity transformation SCST using an orthogonal matrix S. When N is significantly large, the histogram of the eigenvalue distribution can be compared to the theoretical formula derived in the context of the random matrix theory (RMT, in abbreviation). Then the RMT-PCA is applied to high-frequency stock prices in Japanese and American markets. This approach proves its effectiveness in extracting "trendy" business sectors of the financial market over the prescribed time scale. In this case, X consists of N stock- prices of length L, and the correlation matrix C is an N by N square matrix, whose element at the i-th row and j-th column is the inner product of the price time series of the length L of the i-th stock and the j-th stock of the equal length L. Next, the RMT-test is applied to measure randomness of various random number generators, including algorithmically generated random numbers and physically generated random numbers. The book concludes by demonstrating three applications of the RMT-test: (1) a comparison of hash functions, (2) choice of safe stocks, and (3) prediction of stock index by means of a sudden change of randomness.
Perspectives on Big Data Analysis

Author: S. Ejaz Ahmed
language: en
Publisher: American Mathematical Society
Release Date: 2014-08-20
This volume contains the proceedings of the International Workshop on Perspectives on High-dimensional Data Analysis II, held May 30-June 1, 2012, at the Centre de Recherches Mathématiques, Université de Montréal, Montréal, Quebec, Canada. This book collates applications and methodological developments in high-dimensional statistics dealing with interesting and challenging problems concerning the analysis of complex, high-dimensional data with a focus on model selection and data reduction. The chapters contained in this book deal with submodel selection and parameter estimation for an array of interesting models. The book also presents some surprising results on high-dimensional data analysis, especially when signals cannot be effectively separated from the noise, it provides a critical assessment of penalty estimation when the model may not be sparse, and it suggests alternative estimation strategies. Readers can apply the suggested methodologies to a host of applications and also can extend these methodologies in a variety of directions. This volume conveys some of the surprises, puzzles and success stories in big data analysis and related fields. This book is co-published with the Centre de Recherches Mathématiques.