Introduction
Please note that this is a preliminary course description. The final version will be published in June 2026.
This course teaches how to develop scalable, maintainable, and efficient data science algorithms using object-oriented programming. It emphasizes handling large datasets, improving productivity, and supporting collaborative coding practices. From a business standpoint, the course also covers transitioning models from experimentation to production, focusing on the full data science project lifecycle, including data preparation, model training, evaluation, and validation through machine learning pipelines.
Course content
- - Classes, constructors and methods
- Further topics in object-oriented programming:
- Inheritance, superclasses, and subclasses - Version control with Git and Github
- Pull, commit, and push.
- Branching and merging
- Technologies for handling large amounts of data.
The course will go towards advanced python programming with a particular focus on the life cycle of data science projects, including:
- Understand business problems.
- Data curation and preprocessing:
- Reading and writing data “manually” to files in python.
- Handling missing values, outliers, and other data anomalies.
- Database infrastructures and explore and curate data with pandas.
- Make analysis queries that answer business questions.
- Create training data using queries.
- Feature engineering.
- Model building and deployment.