This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.
Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the “shallow” but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications.
What you will learn
You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
During this module, you will learn the overall course design, an overview of natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications, and word association mining with a particular focus on mining one of the two basic forms of word associations (i.e., paradigmatic relations).
During this module, you will learn more about word association mining with a particular focus on mining the other basic form of word association (i.e., syntagmatic relations), and start learning topic analysis with a focus on techniques for mining one topic from text.
During this module, you will learn topic analysis in depth, including mixture models and how they work, Expectation-Maximization (EM) algorithm and how it can be used to estimate parameters of a mixture model, the basic topic model, Probabilistic Latent Semantic Analysis (PLSA), and how Latent Dirichlet Allocation (LDA) extends PLSA.
During this module, you will learn text clustering, including the basic concepts, main clustering techniques, including probabilistic approaches and similarity-based approaches, and how to evaluate text clustering. You will also start learning text categorization, which is related to text clustering, but with pre-defined categories that can be viewed as pre-defining clusters.