Introduction to Big Data





Rapid advances in engineering and computerization of all aspects of social and economic activity have created large volumes of unstructured data, including web logs, videos, voice recordings, photos, e-mails, tweets, and more. Furthermore, large sample sizes and high-throughput data are generated in a short   time, resulting in a large amount of complex data such as biological data and medical data. This feature of recent big data is defined by 5V, such as Volume, Variety, Velocity, Veracity and Value. The key objective of this course is to familiarize the students with most important information technologies used in manipulating, storing, and analyzing big data. This course will cover the fundamental statistical analysis and machine learning algorithms with two representative programming language, R and Python. Most lectures will be presented using R examples. It also additionally aims to equip students with the basic knowledge of software engineers in the Big Data era by introducing the representative framework that will be used in the Big Data deepening course such as Spark 2.0, NoSQL storage solutions, VoltDB, and SciDB, which will be connected later in the advanced courses.