Course Overview

Data Analytics course is to provide basics and usefulness of data analytics, and to develop hands on skills for applying tools and techniques of data analytics in engineering problems, collecting and reshaping raw data having different formats, implementing advanced visualizations to extract usable conclusions, and using productivity tools to organize data analytic projects and generate reproducible reports.

The course content includes:

  • Fundamentals of data analytics
  • Programming basics for data analytics (basic syntax, data types, vectors and vector arithmetic, indexing, conditional expressions, loops and iteration, functions)
  • Principles and techniques of data visualization
  • Data wrangling: import, tidy and process data (importing spreadsheets, web scraping, reshaping data, combining tables, string processing)
  • Productivity tools for organized and reproducible data analytics projects (basic Unix, Git and GitHub, markup languages for reproducible reports)

I designed this course and continue to teach it to the Industrial Engineering Department students at Hacettepe University as part of my faculty responsibilities. You may visit the department’s website for comprehensive information about IE Hacettepe and course catalog to access the full curriculum, which includes details on EMU430 - Data Analytics too.

Course Structure

The course is structured over 15 weeks, targeting both undergraduate and entry-level graduate students. Also, it is highly suitable for advanced individuals lacking a background in programming for data analytics. The course schedule includes one week each for the midterm exam, final exam, and project presentations, which leaves 12 weeks dedicated to instructional activities. For these 12 weeks, I utilize my lecture notes (see Lectures). The content of these notes is primarily derived from Dr. Rafael’s book, Introduction to Data Science (see References).

The course currently utilizes the R as the programing language and Quarto for code-based reproducible reporting purposes. Additionally, the course incorporates GitHub, GitHub Classroom, and DataCamp as productivity platforms. The course can be taught using any programming language, but R and Python are the most compatible with the current design.

GitHub

The course’s progression is significantly influenced by the use of Git and GitHub. Specifically, the class is managed through GitHub Classroom. To put it simply, GitHub Classroom allows us to use a shared account, EMU Hacettepe Analytics. I assign two types of assignments here. The first is an individual task requiring students to create their own repositories to host their personal websites. These websites, based on Quarto, must include a main menu labeled “EMU430 Coursework” during our 15-week course. Under this menu, they post their individual assignments and a link to their project repositories. This leads to the second assignment, a group task where students form teams for their term project. They are provided a repository to host their project website, which should feature team details, data insights, exploratory data analysis, and in-depth analytical work.

This approach is similar to the one employed by Dr. Berk Orbay in his BDA 503 - Essentials of Data Analytics course at MEF University. He has experience with this framework and confirmed its efficacy to me during the planning stage of my course.

Quarto

Quarto is a versatile documentation tool that allows for the integration of code, data, and narrative in a single document. It works both with R and Python. Quarto and GitHub are compatible, enabling students to render their code-based Quarto documents (e.g., websites) on their local computers and then commit and push them to GitHub. Simply setting the repository’s pages settings to deploy from the /docs branch is sufficient for GitHub to automatically publish their webpages. This process is both easy and efficient.

DataCamp

I utilize Datacamp Classroom for my students, providing them with complimentary access to Datacamp’s paid content. Here, I assign tasks that contribute as bonus points towards their grades. I highly recommend this approach to my colleagues who teach analytical courses involving programming languages. There’s a straightforward application and verification process. Once your faculty status is confirmed, all that’s required is to send invitations to your students’ email addresses. Upon their acceptance, they are added to your classroom, allowing you to assign them courses and tasks. As they complete each piece of content, they earn XP points. You can monitor their progress, and there’s even a leaderboard that my students enjoy checking at the beginning of each lecture to see the weekly leaders.

Invited Speakers

I designed this course to include four weeks of guest lectures from Data Science experts. These speakers share their knowledge and real-world experiences, giving students a comprehensive view of practical applications. In the Fall term of 2023-2024, we welcomed four invited speakers.

emu430 foto 1 emu430 foto 2 emu430 foto 3

The first three talks were conducted remotely, and I recorded them and uploaded the videos to my YouTube channel. I highly recommend them to those interested. Please note that the talks are in Turkish.

References

My course content draws upon the excellent references listed below. Specifically, my slides are derived from Dr. Rafael Irizarry’s exceptional book. He has recently refreshed his material, resulting in two outstanding parts:

My other references are:

My GitHub Classroom approach is inspired by the strategy used by Dr. Berk Orbay in his BDA 503 - Essentials of Data Analytics course at MEF University. Additionally, I have drawn inspiration from his assignment and project guidelines.

Of course, the content goes beyond these references and incorporates both my insights and the students’ experiences. Specifically, the individual and project websites created by IE Hacettepe students significantly enrich our content. These students have developed excellent projects from which we can learn a great deal.

Syllabus

The syllabus I used for EMU430-Data Analytics during the Fall Semester of 2023-2024 at IE Hacettepe is outlined below:

EMU430 - Data Analytics, 2023-2024 Fall Semester Syllabus

Lectures

The lecture notes I used for EMU430-Data Analytics during the Fall Semester of 2023-2024 at IE Hacettepe is outlined below. I mainly derived my slides from Dr. Rafael Irizarry’s Introduction to Data Science. While they may not be flawless, I’ve invested considerable time into shaping them, and I anticipate they will continue to develop and mature over the next few years.

2023-2024 Fall Term

Students performed exceptionally well both on individual and team projects. Our GitHub Classroom area, EMU Hacettepe Analytics, hosts individual student and project pages for the Fall 2023-2024.

Individual Student Pages

Photo of Aykut Simsek

Out of 73 students, several excelled in personalizing their websites with exceptional design and content. I’m highlighting three notable examples below. Aykut Simsek, who stood out for his meticulous attention to his website throughout the term, received a book from me as a gift at the semester’s end (Aykut and I are in the photo).

Project Topics and Pages

The term projects resulted in unexpectedly high-quality work from the students. Below are the project pages and the issues they addressed. The first four were my favorites, but most are worth exploring.

  • Team Safe İstanbul This project aims to analyze three datasets from the Istanbul Metropolitan Municipality to identify districts at high risk of earthquakes. Using data on building numbers, construction years, floors, and population, the team will create a risk map and prioritize neighborhoods for earthquake preparedness.

  • synergy Focusing on the cultural sector, this project examines statistics on libraries and museums across Turkish cities. By analyzing visitor numbers, registered users, and book counts, the team intends to assess cultural institution utilization and highlight areas for development.

  • CTRL-S This project explores the competitive landscape of mobile communication technologies and market shares of mobile vendors globally, with a focus on how country-specific dynamics influence market positions.

  • Data Ciphers Through the Türkiye Health Survey, this project analyzes health indicators, disease prevalence, alcohol use, and body mass indices to provide insights into national health trends and facilitate international health comparisons.

  • data_vizards Analyzing 2021 internal migration data in Turkey, this project explores migration trends, demographic factors, and economic indicators to understand regional migration patterns and their implications.

  • EMUTrend Explorers This project examines student preferences for Industrial Engineering departments in Ankara universities, focusing on demand and placement trends using the YOK ATLAS dataset. The analysis includes data cleaning to focus on preferences of students with full scholarships and aims to understand trends in department selection.

  • icaRdi The aim of this project is to examine the relationship between team performance metrics and identify the metrics that significantly influence team ratings.

  • semicolon Focused on developing strategies for suicide prevention across different age groups and genders, aiming to identify effective approaches to reduce suicide rates.

  • 4k1e_rda Examines crime data from 2011 to 2020 with a focus on the gender and educational status of criminals, using data from the Turkish Statistical Institute to analyze crime trends.

  • mission prediction Analyzes the potential impact of a 7.5 magnitude earthquake in Istanbul, focusing on infrastructure and population risks. The study aims to guide local authorities and emergency responders in enhancing the city’s resilience and preparedness.

  • red_flag Analyzing and comparing Consumer Price Index (CPI) data from Turkey and the US to understand inflation trends and the economic and sociological factors influencing these trends.

  • Altı Üstü Data Investigating the relationship between air pollution and forestation, this study uses data on PM10 levels and forest area per square kilometer to analyze and understand the impact of forestation on air quality.

  • data_criminals A straightforward project aimed at understanding the correlation between educational levels and crime rates.

  • Germen Obasi Investigating the relationship between education and happiness using TUIK data on happiness rates and literacy rates from 2003 to 2022. The study aims to explore if educational status correlates with happiness levels and seeks to demonstrate the potential independence of these factors.

  • Quant Flare This study analyzes Turkey’s population growth rate and the number of foreign babies, relying on TURKSTAT data to understand demographic changes and their implications for Turkey’s future.