Problem Set #3 - Project Pre-Proposal and Neural Networks
If you use notebooks, provide a brief explanation for each question using the markdown functionality. I encourage you to discuss the problems, but when you write down the solution please do it on your own. Don’t use LLMs without explaining how you’ve used them (if you don’t disclose usage, you might lose points). You can raise and answer each other’s questions on Slack; but please don’t provide exact solutions. Submit Jupyter notebooks (or scripts) for the coding problem on Moodle on this link: Moodle Assignment Submission Link before the due date of February 24, at 23:59pm.
Problem 0: Project Pre-Proposal (50 points)
Write a one-page project pre-proposal and post it on the course Slack channel. The pre-proposal should contain:
- A clear scientific question you want to investigate.
- The dataset you plan to use (real data preferred; synthetic data acceptable if the methodological contribution is clear). Include a link to the data source if publicly available.
- The machine learning method(s) you plan to apply (SINDy, symbolic regression, PINNs, neural ODEs, DMD, autoencoders, or others).
- The scientific inductive biases you intend to incorporate (conservation laws, symmetries, known interaction terms, spatiotemporal structure, etc.).
- At least 3 references: the original method paper, a paper applying it to a similar system, and a paper describing the scientific domain.
Keep it to one page. This is not a contract; you are allowed to pivot later. The purpose is to get early feedback and start thinking concretely about scope.
Refer to the full project requirements for details on expectations, grading, and project ideas.
Grading: The pre-proposal is graded together with the progress report (Week 7) for a combined 5% of the course grade. What I look for at this stage is evidence that you have read at least one relevant paper, identified a dataset, and thought about what makes your project scientific rather than a generic ML exercise.
Problem 1: Neural Network as a PDE Surrogate (50 points)
In the previous homework, you solved the diffusion equation using finite differences. Numerical solvers are accurate but can be expensive for fine grids or repeated evaluations. An alternative is to train a neural network to approximate the solution directly: given coordinates \((x, t)\), the network predicts \(u(x, t)\) without running the solver. This is called a surrogate model.
In this problem, you will generate data from a diffusion equation solver, train a neural network to approximate the full solution field, and investigate how well the network generalizes in time.
Setup
Consider the one-dimensional diffusion equation:
\[\frac{\partial u}{\partial t} = D \frac{\partial^2 u}{\partial x^2}\]on the domain \(x \in [-2, 2]\), \(t \in [0, 2]\), with diffusion coefficient \(D = 0.02\), boundary conditions \(u(-2, t) = u(2, t) = 0\), and initial condition:
\[u(x, 0) = \exp\left(-\frac{x^2}{2 \sigma^2}\right), \quad \sigma = 0.2\]Part (a): Data Generation (5 points)
Solve the diffusion equation numerically using the forward Euler finite difference scheme:
\[u_i^{n+1} = u_i^n + \frac{D \, \Delta t}{\Delta x^2} \left( u_{i-1}^n - 2 u_i^n + u_{i+1}^n \right)\]Use \(\Delta x = 0.01\) and choose \(\Delta t\) such that the CFL condition \(D \Delta t / \Delta x^2 < 0.5\) is satisfied. Plot the solution at several time snapshots to verify it looks reasonable (the Gaussian should spread and flatten over time).
Part (b): Preparing the Training Data (5 points)
Construct a dataset of input-output pairs \(\{(x_i, t_i), u(x_i, t_i)\}\) from the numerical solution. Split the data as follows:
- Training set: all points with \(t \leq 1.5\) (75% of the time domain).
- Test set: all points with \(t > 1.5\) (the remaining 25%).
Shuffle the training set. Optionally reserve a validation subset (e.g. 20% of the training data) for monitoring overfitting during training.
Part (c): Building and Training the Neural Network (15 points)
Build a feedforward neural network in PyTorch with:
- Input: 2 features \((x, t)\)
- Output: 1 value \(\hat{u}(x, t)\)
- Architecture: at least 2 hidden layers with an activation function of your choice (try
ReLU,Tanh, orELU). Start with 20 to 50 neurons per layer.
Train the network using the MSE loss and the Adam optimizer. Plot the training loss (and validation loss if applicable) as a function of epoch. Train for enough epochs that the loss plateaus.
Tips:
- Normalize or standardize the inputs if needed.
- Use
torch.utils.data.DataLoaderfor batching. - A learning rate around \(10^{-3}\) is a reasonable starting point.
Part (d): Evaluating the Surrogate (10 points)
Compare the neural network predictions with the numerical solution:
- Plot the NN prediction \(\hat{u}(x, t)\) and the true solution \(u(x, t)\) side by side at 4 to 6 time snapshots spanning both the training and test time ranges.
- Compute the mean squared error at each time step and plot it as a function of time. Draw a vertical line at \(t = 1.5\) to mark the boundary between training and test data.
- Does the error grow beyond the training horizon? By how much?
Part (e): Architecture Experiment (5 points)
Try at least two different architectures (e.g. varying the number of layers, neurons per layer, or activation function). Report the test MSE for each and briefly discuss which choices mattered most.
Bonus: Learning the Time Integrator (10 points)
Instead of learning the map \((x, t) \to u\), try learning the one-step time integrator. Train a network that takes a local stencil \((u_{i-1}^n, u_i^n, u_{i+1}^n)\) as input and predicts \(u_i^{n+1}\). Then iterate the learned integrator forward in time starting from the initial condition. Compare the result with the finite difference solution. Does this approach extrapolate better than the direct surrogate from Part (d)?