Introduction¶

The Surge of AI: Motivation for This Course¶

Artificial Intelligence (AI) has evolved from a specialized field into a pivotal force shaping modern technology. Through our engagement with students across various disciplines, it is clear that many are intrigued by AI. They range from those engaged in PhD research to others seeking exceptional career opportunities or simply exploring their options. Despite their interest, a common challenge persists: a lack of holistic views about what to learn and how to effectively use AI tools. AI has experienced rapid advancements over the past two decades, with new methodologies and concepts emerging frequently. The terms “AI” and “Generative AI” themselves are broad, encompassing a wide range of disciplines including computer science, statistics, optimization, control, information theory, often leading to disparate interpretations and terminologies across different academic backgrounds. Fortunately, the main concepts and techniques of AI does not evolve as rapidly as it appears. Many thoughts are deeply rooted in established fields, such as how to organize data, set up the optimization problem, and interpretation of results.

This Stat 8931 course is designed with the goal of lowing the barrier for new researchers and inspiring them to pursue state-of-the-art research in the relevant fields. We aim to achieve the goal through on-site implementations, explorations, discussions, and offline communications (as suggested in the syllabus). While many excellent open-source tutorials or courses exist, focusing primarily on practical implementation, our course will be research-oriented and encourage students to critically think the future directions they might take.

Philosophy of Course Arrangement¶

Our discussions will often center around three fundamental elements:

Data: The essential components organized and synthesized into actionable knowledge.
Learning: The process through which we gain insights, predict outcomes, and develop generalizable knowledge from data.
AI: The capability to structure data and learning processes autonomously.

	Practice	Principles
Data	How to organize the available data? What kind of data processing is necessary?	What piece of information is more relevant to the type of learning? How is the information effectively used?
Learning	What are alternative algorithms and their practical considerations? Does the optimization convergence imply any generalizability at the population level?	What are alternative problem formulations? What is the rationale behind the learning problem?
AI	To what extent does a data+learning formulation exhibit autonomy? Can the developed ‘’AI’’ survive in open-world environments where the underlying assumptions are not met?	Is this how human sets objectives when acting?

Following the above framework, we refer to AI as the broad field concerned with creating systems that can perform tasks which would typically require human intelligence. Generative AI specifically refers to the focus on creating models that can generate new data samples by learning and approximating data distributions.

How to Use This Course and Materials¶

Prerequisite¶

This course does not have strict graduate-level prerequisites. However, a basic understanding of statistics or machine learning, along with programming experience in Python, will be advantageous.

Expectation from Students¶

Students are expected to have:

Self-motivation to study independently and a significant amount of time dedicated to learning and homework assignments per week.
A willingness to engage in critical and creative thinking, to propose new research ideas, and to challenge existing practices.

Outline of the Course¶

Chapter 1. Kickoff and Quick Study of Deep Learning (Sept 6, Sept 13) Dive into the essentials of deep learning, exploring core model architectures such as CNNs, VAEs, and ResNets. Understand the computational underpinnings, including frameworks like Pytorch and Tensorflow, optimization methods like SGD and ADAM, and basic principles. We will set up the development environments, including setup of Python, Pytorch, IDE, Github, and optionally Minnesota Supercomputing Institute (MSI) and Huggingface. Finally, we will go through some (superficial) usage of pretrained large language models (LLMs), vision LLMs, text-to-image and text-to-video diffusion models.

Chapter 2. Large Language Modeling (Sept 13, 20) We will dissect the model structures that underlie large language modeling, including various decoding algorithms and their inherent constraints, tokenizer training, positional encoding, attention mechanisms, and explore emergent linguistic abilities at scale such as zero-shot and few-shot learning capabilities.

Chapter 3. Training LLMs from Scratch (Sept 27) We will cover the end-to-end process of building a transformer-based model from scratch. Specifically, we will go through computational tools such as Pytorch and cloud-computing, build model architectures, evaluate models, and examine how different hyperparameter choices could affect model performance.

Chapter 4. Finetuning and Human-Value Alignment (Oct 4) We will introduce techniques used in finetuning LLMs towards particular tasks or preferences. These include instruction finetuning and reinforcement learning from human feedback. We will also explore the connection between practical algorithms and fundamentals in optimization and statistics.

Chapter 5. Computation and Memory Problems in Foundation Models’ Training (Oct 11, Oct 18) We will target strategies for optimizing computation and memory usage in the training and finetuning of large foundational models. Topics include distributed training methodologies, memory-efficient techniques like ZeRO and parameter-efficient finetuning, alongside explorations of mixture-of-experts architectures.

Chapter X. Project Preparation and Proposal Discussion (Oct 18, Oct 25)

Chapter 6. Diffusion Models (Nov 1) We will introduce diffusion models, a class of generative models that have shown remarkable proficiency in synthesizing high-quality data, especially text-to-image data. We will start with variation autoencoders and then discuss the architecture of many state-of-the-art diffusion models. We will explore how such models can enable the multi-scale representations crucial for generating photorealistic images from textual descriptions.

Chapter 7. Retrieval Augmented Generation (Nov 8) We will examine how Retrieval-Augmented Generation (RAG) can improve the performance of LLMs in knowledge-intensive tasks, examining its computational scalability and safety.

Chapter 8. Efficient Deployment Strategies for Foundation Models (Nov 15) We will introduce standard techniques to reduce sizes of standard deep model and transformer models, such as compiling, quantization, pruning, knowledge distillation, their applications to model deployment, and the statistical rationales.

Chapter 9. Ethics and Safety in Generative AI (Nov 22) As we navigate the complexities and capabilities of AI technologies, ethical considerations will form an integral part of our curriculum. We will introduce quantitative methods and metrics to assess the safety of generative models from two perspectives. One is from an ethics perspective, including fairness and toxicity. We will also delve into content moderation techniques grounded in statistical detection and watermarking techniques. The other is from a machine learning security perspective. We will study several angles that practical AI systems must counteract, including adversarial examples, privacy, data poisoning backdoor attacks, membership inference attacks, model-stealing attacks, and their statistical foundations. The lecture will also discuss how these security issues are arising in large generative models.

Chapter Y. Final Project Presentation and Discussion (Dec 6) In summary, this course is not an introduction of generative AI tools. It is a research-oriented course designed for PhD students to quickly learn the key toolsets and principles that drive the field. By offering hands-on experiences and highlighting theoretical bases, we aim to equip students to apply their knowledge across scientific domains and foster innovative AI-for-science initiatives.

How to Gain from the Course¶

This website contains the main lecture notes created by GenAI course team. Each chapter will consist of background of the problem, formulations, principles, and implementation strategies. In line with the motivation and philosophy of the course design, the course will be a highly interweaved mixture of practical coding, study of basic principles, and open problems raised to stimulate further thinking. Specifically, we will have a significant portion of the course material dedicated to code implementation so you can immediately apply the learned techniques to various domain problems.

This website is home to the GenAI course lecture notes crafted by Jie Ding and the course design team. Each chapter is structured to cover the background of the topic, problem formulations, underlying principles, and practical implementation strategies. Reflecting the course’s philosophy, the content is an integrated blend of hands-on coding exercises, theoretical exploration, and discussions on open problems designed to encourage further inquiry and innovation.

Software Environment A substantial portion of the course is devoted to coding implementations, allowing you to apply the techniques you learn directly to problems across various domains. Python is chosen for its versatility and simplicity, making it ideal for both beginners and experienced programmers. It is widely used in the AI community due to its extensive libraries and community support. Detailed instructions on setting up this environment will be covered at the end of this introduction., Necessary dependencies will be provided to ensure you can seamlessly transfer code into a Jupyter notebook within each chapter.

Hardware Environment Thanks to the generous support from Minnesota Supercomputing Institute (MSI), all enrolled students will receive an educational account providing access to a GPU cluster equipped with a Slurm fair queuing system. This setup allows you to utilize necessary computational resources to practice the techniques learned and to conduct research projects related to the course.

(One-time) Quick Setup of Python Environment¶

This guide provides step-by-step instructions on how to create a Python virtual environment named GenAI using Python 3.8, and then how to install packages using a requirements.txt file.

Approach One: Using Python’s Built-in `virtualenv` Tool¶

This approach has a faster setup and takes less disk space compared with Approach Two, ideal for those who need a lightweight setup.

Step 1: Install Python: Ensure that Python 3.8 or above is installed on your system. You can download it from python.org. Alternatively, use command line as follows. On Windows, you can search for Command Prompt in the Start menu. On macOS or Linux, open the Terminal application.

On Windows: first install Chocolatey (a package manager for Windows) and then can install Python using
```
choco install python --version=3.8
```
On macOS or Linux: first install Homebrew (a popular package manager) and then install Python using
```
 brew install python@3.8
```

Step 2: Create a Virtual Environment: Navigate to your project directory, and create a virtual environment named GenAI using Python 3.8.

On Windows:

cd path/to/your-project
py -3.8 -m venv GenAI

On macOS or Linux:

cd path/to/your-project
python3.8 -m venv GenAI

Step 3: Activate the Virtual Environment

On Windows:
```
.\GenAI\Scriptsctivate
```
On macOS or Linux:
```
source GenAI/bin/activate
```

Once activated, your terminal prompt should change to indicate that you are now working within the GenAI virtual environment. You have now successfully created a Python environment named GenAI, which is isolated from other Python environments on your system. To deactivate the environment and return to your global Python environment, run:

deactivate

Step 4: Install Packages: Install one or more packages using

pip install package_name1 package_name2 ...

If you have a requirements.txt file that lists all the packages you want to install along with their versions, you can batch-wise install them by running

pip install -r requirements.txt

Approach Two: Using Conda¶

This approach uses Anaconda or Miniconda for a more integrated environment management. Conda commands are identical across operating systems.

Step 1: Install Anaconda or Miniconda: For a full-featured environment with numerous pre-installed packages, download Anaconda. For a more minimal installation, download Miniconda.

Step 2: Create a New Conda Environment

 conda create --name GenAI python=3.8

Step 3: Activate or Deactivate the Conda Environment

 conda activate GenAI
 conda deactivate

Step 4: Install Packages: Install one or more packages using

 pip install package_name1 package_name2 ...

For our course, you will likely need the following packages:

 pip install accelerate datasets diffusers gym huggingface-hub ipykernel matplotlib numpy opencv-python pandas transformers 

If you have a requirements.txt file that lists all the packages you want to install along with their versions, you can batch-wise install them by running

pip install -r requirements.txt

For Jupyter Notebook users, to ensure the newly created Conda environment named GenAI shows up in Jupyter Notebook, you need to install the ipykernel package in that environment and then register it with Jupyter.

pip install notebook ipykernel
python -m ipykernel install --user --name GenAI --display-name "Python 8 (GenAI)"
jupyter notebook

For MSI ondemand cloud users, remember to add the following lines as attached when launching a jupyter in ondemand, so you access the right Python path once activating GenAI.

module load conda
module load cuda

Skill Sets¶

Here are resources to help you develop skills for conducting advanced R&D. While Python and PyTorch are required, familiarity with other technologies listed will greatly enhance your learning experience and capability to handle advanced topics.

Learn Python¶

Python is a fundamental skill for anyone working in data science and AI. Here are some great resources to get started or to sharpen your Python skills:

Python Official Tutorial - Ideal for beginners.
Real Python - Offers in-depth tutorials and explanations on various Python topics.
Automate the Boring Stuff with Python - A practical and fun approach to learning Python.

Learn Pytorch¶

PyTorch is one of the leading libraries for deep learning research. Here’s where you can learn more about it:

Install it immediately after setting up the Python environment!

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 # customized based on https://pytorch.org/get-started/locally/

PyTorch Official Tutorials - Comprehensive tutorials from the official PyTorch website.
Deep Learning with PyTorch: A 60 Minute Blitz - A quick introduction to the core concepts of PyTorch.

Learn CUDA¶

CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing. Here are some tutorials to get started:

CUDA Tutorials on PyTorch - Learn how to extend PyTorch with custom C++ and CUDA extensions.
GPU Puzzles - Practice your GPU programming skills with these puzzles.

Learn Slurm¶

Slurm is a highly configurable open-source workload manager. Here are some resources to learn more about managing computing workloads:

Job submission and scheduling with Slurm - MSI-provided guideline of using Slurm.

Learn Kubernetes¶

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

Kubernetes documentations - Kubernetes basics.
Kubernetes Course - Full Beginners Tutorial - A video tutorial for beginners.