Problem Set #5 - Data-Driven Reduced Order Modeling - Data-Driven Modeling in Science and Engineering / Spring 2026

If you use notebooks, provide a brief explanation for each question using the markdown functionality. I encourage you to discuss the problems, but when you write down the solution please do it on your own. Don’t use LLMs without explaining how you’ve used them (if you don’t disclose usage, you might lose points). You can raise and answer each other’s questions on Slack; but please don’t provide exact solutions. Submit Jupyter notebooks (or scripts) for the coding problem on Moodle.

Due: March 25, 2026

Problem 1: POD, DMD, and Reduced Order Modeling (80 points)

In chapter 12 of Data-Driven Science and Engineering, the authors discuss the concept of Reduced Order Modeling (ROM). The idea is to approximate the solution of a high-dimensional system of equations by projecting the solution onto a low-dimensional subspace. This is particularly useful when the high-dimensional system is computationally expensive to solve.

In systems where the variable depends on space and time \((x, t)\), the basic idea is to decompose the solution into a sum of spatial and temporal modes:

\[u(x, t) \approx \sum_{k=1}^{n} a_k(t) \psi_k(x)\]

where \(\psi_k(x)\) are the spatial modes and \(a_k(t)\) are the temporal modes. This eigenfunction expansion is the basic assumption behind the Proper Orthogonal Decomposition (POD) when applied to partial differential equations. One possibility for choosing the spatial modes is to use the Fourier basis, i.e. \(\psi_k(x) = \exp(i 2\pi k x/L)\). This is reminiscent of the analytical solution of the heat equation, where the solution can be written as a sum of sines and cosines.

When solution data for \(u(x, t)\) are given, the spatial modes can be obtained by performing a Singular Value Decomposition (SVD) of the data matrix (where each column is a snapshot of the solution at a given time): \(X = U \Sigma V^T\). The spatial modes are then given by the left singular vectors \(U\), and the temporal modes are given by the right singular vectors \(\Sigma V^T\) (see section 12.6).

In this exercise, we will explore a few approaches described in section 12.6, and apply them to the video of an oscillating spring.

The question we want to answer is: given the first \(m\) frames of the video of a spring oscillation, can we predict the next \(n\) frames?

Data: Download the spring oscillation video (spring-osc-trim.mp4). Starter code for reading the video with OpenCV is available in this Colab notebook.

Part I: POD and Neural Network Time-Stepper (50 points)

a) Read the spring oscillation video using the starter code. The code converts frames to grayscale and decreases the resolution. Crop the frames and decrease the resolution as needed to decrease computational cost.

b) Stack the frames into a data matrix \(X\) where each column is a snapshot of the solution at a given time. Remove the mean frame from the data matrix and perform a Singular Value Decomposition (SVD) of the data matrix \(X = U \Sigma V^T\). The spatial modes are then given by the left singular vectors \(U\), and the temporal modes are given by the right singular vectors \(\Sigma V^T\). Plot the first 5 spatial modes, and the first 5 temporal modes. Note that

\[\Sigma V^T = \begin{bmatrix} \vert & \vert & & \vert \\ \mathbf a_1 & \mathbf a_2 & \ldots & \mathbf a_m \\ \vert & \vert & & \vert \\ \end{bmatrix}\]

where \(\mathbf a_i\) is the \(i\)-th snapshot of the original data over which the simplified model will be constructed.

c) How many modes do you need to capture 95% of the energy (variance) of the system? Plot the cumulative energy (eigenvalues) of the system as a function of the number of modes.

d) Use the number of modes (\(p\)) discovered in the previous question to fit a neural network time-stepper that maps \(\mathbf a^{(p)}_k\) to \(\mathbf a^{(p)}_{k+1}\), where \(\mathbf a^{(p)}\) indicates the first \(p\) time-modes. For this question, use a fully connected neural network \(f_\mathbf{w}\) to fit the model:

\[\mathbf a^{(p)}_{k+1} = f_\mathbf{w}(\mathbf a^{(p)}_k)\]

Use the first 70% of the frames as training data, the next 20% as validation data, and the last 10% as test data. Be careful to maintain the temporal structure of the data if you shuffle it.

e) Having obtained the weights \(\mathbf{w}\) of the neural network, evaluate your model on the validation data. Start with the last time step of your training data, and use the \(p\) dominant singular vectors that you obtained from the training set \(U_p\). Use the neural network to predict the frames for all the time steps in the validation data, by iterating through \(f_\mathbf{w}()\). Once you obtain the predictions \([\mathbf a_{r+1}, \mathbf a_{r+2}, \ldots, \mathbf a_{r+n}]\), use the spatial modes to reconstruct the predicted frames.

Plot the absolute difference between predicted frames and the actual frames for the last 5 frames of your validation set.
Does your loss improve if you use more modes?
Adjust your hyperparameters (network architecture, number of epochs etc.) to improve your results, and finally evaluate your model on the test data.
Plot the mean square error between the predicted frames and the actual frames as a function of time for your test set, where the first frame of the test set is given.
Increase \(p\) and evaluate its effect on the performance of the model.

f) Repeat the same exercise using a different time-series model for the latent variable (e.g. LSTM, Autoregressive model, SINDy, etc.). Compare the results with the previous question and comment on your results.

Part II: Dynamic Mode Decomposition (30 points)

In Part I, you used SVD to extract spatial modes that maximize variance. However, SVD modes can mix multiple frequencies; they are statistically optimal but not necessarily dynamically coherent. Dynamic Mode Decomposition (DMD) addresses this by extracting modes that are eigenvectors of the best-fit linear dynamics operator. Each DMD mode oscillates at exactly one frequency with a well-defined growth or decay rate (see Chapter 7 of Data-Driven Science and Engineering).

Using the same spring oscillation data from Part I (mean-subtracted snapshot matrix), apply DMD and compare with POD.

g) Form the shifted snapshot matrices from the mean-subtracted data:

\[X = [\mathbf{x}_1 \; \mathbf{x}_2 \; \ldots \; \mathbf{x}_{m-1}], \quad X' = [\mathbf{x}_2 \; \mathbf{x}_3 \; \ldots \; \mathbf{x}_{m}]\]

Implement the DMD algorithm:

Compute the truncated SVD of \(X\): \(X \approx U_r \Sigma_r V_r^T\) (use the same rank \(r\) you found in Part I(c)).
Form the reduced dynamics operator: \(\tilde{A} = U_r^T X' V_r \Sigma_r^{-1}\).
Compute eigenvalues \(\lambda_i\) and eigenvectors \(W\) of \(\tilde{A}\).
Recover the DMD modes: \(\Phi = X' V_r \Sigma_r^{-1} W\).

Plot the DMD eigenvalues \(\lambda_i\) in the complex plane. What do their magnitudes and angles tell you about the dynamics of the spring? Are any modes growing, decaying, or neutrally stable?

h) For each DMD mode, extract the frequency and growth rate:

\[\omega_i = \frac{\angle \lambda_i}{\Delta t}, \qquad \sigma_i = \frac{\ln \lvert \lambda_i \rvert}{\Delta t}\]

Plot the DMD spatial modes (reshape each column of \(\Phi\) back to the frame dimensions). Compare these visually with the POD modes from Part I(b). How do they differ?

i) Reconstruct the video using DMD. The reconstruction at time step \(k\) is:

\[\mathbf{x}(k) = \sum_{i=1}^{r} b_i \lambda_i^k \boldsymbol{\phi}_i\]

where \(b_i\) are the initial amplitudes obtained from \(\mathbf{b} = \Phi^\dagger \mathbf{x}_1\) (\(\Phi^\dagger\) is the pseudo-inverse of the mode matrix). Use this formula to predict frames beyond the training set (the same validation/test split as Part I). Plot the reconstruction error over time and compare with the neural network time-stepper from Part I(e).

j) In 3-5 sentences, discuss: When would you prefer DMD over the POD + neural network approach from Part I, and vice versa? Consider interpretability, computational cost, and the assumption of linearity.

Problem 2: Dynamical Systems with Neural Networks (20 points)

The Lorenz system is a system of ordinary differential equations (ODEs) that was developed by Edward Lorenz in the 1960s to describe the behavior of a simple climate model. The system is often used as an example of chaotic systems, or what is known in popular media as the “butterfly effect”. It is given by the following set of ODEs:

\[\begin{align*} \frac{dx}{dt} &= \sigma(y-x) \\ \frac{dy}{dt} &= x(\rho-z) - y \\ \frac{dz}{dt} &= xy - \beta z \end{align*}\]

where \(x\), \(y\), and \(z\) are the state variables, and \(\sigma\), \(\rho\), and \(\beta\) are parameters. For \(\rho = 28\), \(\sigma = 10\), \(\beta = 8/3\), the state variable \(\mathbf x = [x, y, z]\) is known to be chaotic.

NOTE: If you’ve had enough of the Lorenz system, feel free to use any other system of ODEs or even PDEs that you find interesting.

a) Solve the differential equation numerically using the initial conditions \(\mathbf x = [1, 1, 1]\), a simulation time of \(t_{end}=100\) seconds and \(dt = 0.01\).

b) Given the simulated time series above, and assuming that we don’t know the underlying differential equation, we would like to fit a model that predicts the state \(\mathbf x \equiv \mathbf x(t_i)\) at a given time \(t_i\) based on the previous time-steps:

\[\mathbf x_i = f_\mathbf{w}(\mathbf x_{i-1}, \mathbf x_{i-2}, \ldots, \mathbf x_{i-n})\]

where \(f_\mathbf{w}\) is a neural network. Define the input (feature vector) and output for \(n>1\).

c) Assume \(n=1\). Use a fully connected neural network architecture to train your input-output model. Take the first 80% of the time series as training data, and the last 20% as test data. Plot your results.

d) Repeat the same exercise for \(n>1\). How does your solution compare with the previous question? (Hint: in the first layer, use Flatten() to transform your \(n \times 3\) input to a \(3n\) dimensional input. You have to integrate the solution in a for loop: given the initial condition \([\mathbf x_1, \mathbf x_2, \mathbf x_3, \mathbf x_4, \mathbf x_5]\), predict \(\mathbf x_6\), then given \([\mathbf x_2, \mathbf x_3, \mathbf x_4, \mathbf x_5, \mathbf x_6]\), predict \(\mathbf x_7\), etc.).

e) Now replace the fully connected network with a Recurrent Neural Network (RNN). Use a simple RNN (e.g. torch.nn.RNN) that takes the sequence \([\mathbf x_{i-n}, \ldots, \mathbf x_{i-1}]\) as input and predicts \(\mathbf x_i\). Train it on the same data and evaluate on the test set. How does the RNN compare to the fully connected network from part (d)?

f) Replace the simple RNN with an LSTM (torch.nn.LSTM). Train and evaluate using the same setup. Compare the LSTM predictions with both the simple RNN and the fully connected network. Which architecture handles the chaotic dynamics best, and why do you think that is?