Submit your coding problems in separate folders. Use a Jupyter notebook to provide a brief explanation for each question using markdown cells. I encourage you to discuss the problems, but when you write down the solution please do it on your own and don’t get a language model to write everything for you!. If you use an LLM, explain how. You can raise and answer each other’s questions on Slack; but don’t provide exact solutions. Submit the assignment to Moodle here: Moodle Submission Link before 11:59 PM on Jan 31.

Problem 0: ML True or False Questions (5 points)

Determine if the following statements are True or False and provide a brief explanation:

(a) If the linear predictor is overfitting on the training set, the training set error is much larger than the test set error.

(b) The purpose of cross-validation is to prevent overfitting on the test set.

(c) A development set (or dev set) is used to prevent overfitting on the test set.

(d) Increasing the amount of data will prevent the algorithm from overfitting.

(e) Adding a regularization term will decrease the likelihood of underfitting.

(f) Stochastic gradient descent requires less updates on \(\mathbf w\) to converge to the optimal solution.

Problem 1: A Stochastic Gradient Descent to Hooke’s Law (30 points)

In 1676, Robert Hooke discovered the law of elasticity, which states that the extension of a spring is proportional to the force applied to it. In class, we took photos of the extension of the spring for different weights. You are provided with the images and the weights used in the experiment. Your task is to use linear regression to find the parameters of Hooke’s law using both the normal equation and stochastic gradient descent. The true test of your model will be to submit predictions to a Kaggle competition in which you have to estimate the weights of new images.

(a) Create an account and participate in the Kaggle competition here where you’ll get access to the training and test images. Extract the data from the images and create a dataset \(\{ (x^{(i)}, y^{(i)}) \}_{i=1}^{n}\) where \(x^{(i)}\) is the extension of the spring in image \(i\) (in pixels) and \(y^{(i)}\) is the weight used (in Newtons \(\sim 1000M\) where \(M\) is its mass). You can use any method you like to extract the data from the images (e.g, image processing, etc.). Provide a brief description of your method.

(b) Given a linear hypothesis \(f_\mathbf{w}(x) = \mathbf w^\top \phi(x)\), build the design matrix \(\mathbf X\), the output vector \(\mathbf y\) from your data, and compute the least squares solution \(\mathbf w^*\) using the normal equation. Explain your choice of feature extractor \(\phi(x)\).

(c) Plot the model \(f_{\mathbf w^*}(x)\) on top of the data. What is the total loss of the model?

(d) Repeat the same exercise using stochastic gradient descent. Start with \(\mathbf w = [0, 0]\), and use a learning rate of \(\eta = 0.01\) (adjusting as needed to converge to a stable solution). Compare the final weights to those obtained from the normal equation on a plot.

(e) Create a plot with two subplots: a) the evolution of the loss as a function of the number of epochs, and b) the evolution of the weights in the \((w_0, w_1)\) plane.

(h) (5 points) Submit your predictions on the test set to Kaggle and report your score. The top 5 on the leaderboard of the hidden test set will get 5 points, the next 5 will get 4 points, the next 5 will get 3 points, etc.

Problem 2: Gauss and the Lost Planet (30 points)

In 1801, astronomer Giuseppe Piazzi discovered Ceres from his observatory in Sicily. He tracked it for 41 days before it disappeared into the Sun’s glare. With less than 1% of its orbit observed, astronomers feared Ceres was lost forever. A 24-year-old mathematician named Carl Friedrich Gauss took on the challenge. Using a method he had secretly developed (least squares) Gauss predicted where Ceres would reappear. His prediction was accurate to within half a degree, and Ceres was rediscovered on December 31, 1801.

In this problem, you will work with Piazzi’s actual observations and explore why Gauss’s approach was so revolutionary.

Background: The Data

Piazzi recorded ~21 observations, each consisting of:

  • Date/Time: When the observation was made
  • Right Ascension (RA): Angular position along the celestial equator (like longitude), in hours, minutes, seconds
  • Declination (Dec): Angular position above/below the celestial equator (like latitude), in degrees, minutes, seconds

Download the Ceres observation archive from this GitHub repository.

Important: This file contains over 7,000 observations spanning 1801 to modern times!

  • Piazzi’s observations: Lines with dates A1801 01 and A1801 02 up to Feb 11 (~first 21 lines)
  • Recovery observations: Lines with A1802 01 (when Ceres was rediscovered in late Dec 1801/Jan 1802)

The MPC format looks like:

00001    A1801 01 01.82630 03 38 23.07 +16 17 25.5    MC004535

This means: January 1, 1801 at 0.82630 of a day (~19:50 UTC), RA = 3h 38m 23.07s, Dec = +16° 17’ 25.5”

Part (a): Data Processing

  1. Parse Piazzi’s observations (Jan 1 to Feb 11, 1801)
  2. Convert dates to “days since January 1, 1801” (let Jan 1 = day 0)
  3. Convert RA to decimal hours: RA_decimal = hours + minutes/60 + seconds/3600
  4. Convert Dec to decimal degrees: Dec_decimal = degrees + minutes/60 + seconds/3600 (watch the sign!)
  5. Create a pandas DataFrame with columns: day, RA, Dec

Part (b): Visualizing the Motion

  1. Plot RA vs. time (days). What do you observe? Is Ceres moving east or west?
  2. Plot Dec vs. time (days). What do you observe?
  3. Plot Dec vs. RA (this shows Ceres’s path across the sky). Add arrows or color to see the direction of motion.

Part (c): Polynomial Fitting

  1. Fit linear, quadratic, and cubic models to RA(t) and Dec(t)
  2. Plot the residuals for each model. Which degree polynomial fits best?

Part (d): The Extrapolation Problem

Ceres was rediscovered on December 31, 1801 (around day 364). Use your polynomial models to predict where Ceres should have been:

  1. Extrapolate your linear, quadratic, and cubic fits to day 364
  2. Report the predicted RA and Dec for each model
  3. The actual position when rediscovered was approximately RA = 12.7h, Dec = 11\(^\circ\). How far off are your predictions?
  4. Plot the (RA, Dec) trajectory from day 0 to day 400 for each polynomial. What goes wrong?

Part (e): Why Gauss Needed Kepler

Your polynomial extrapolations should fail spectacularly. In today’s terms, we’d call Gauss’s approach “physics-informed”. Gauss used two key insights from Kepler’s laws:

  1. Kepler’s First Law (Shape): Planetary orbits are ellipses. This constrains the shape of Ceres’s path in the sky.

  2. Kepler’s Second Law (Timing): A planet sweeps equal areas in equal times. This tells us how fast Ceres moves along its elliptical path—and it’s not constant!

Fitting the shape alone isn’t enough. Even if you know the orbit is an ellipse, you still need to predict where on that ellipse Ceres will be at a future time. Let’s explore both pieces:


Part 1: Fitting the Orbital Shape

The general equation for a conic section (ellipse, parabola, or hyperbola) is:

\[Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0\]

If we set \(F = 1\), this becomes:

\[Ax^2 + Bxy + Cy^2 + Dx + Ey = -1\]

(a) Is this equation linear or nonlinear in the parameters \((A, B, C, D, E)\)? Explain your reasoning.

(b) Write down the design matrix \(\mathbf{X}\) and target vector \(\mathbf{y}\) that would allow you to solve for \((A, B, C, D, E)\) using least squares. Use your Piazzi data points \((RA_i, Dec_i)\) as \((x_i, y_i)\).

(c) Fit the conic section to Piazzi’s observations using least squares and report the coefficients \((A, B, C, D, E)\).


Part 2: The Time Problem

Suppose you’ve successfully fit an ellipse to Piazzi’s observations. You now know the shape of the orbit. But to predict where Ceres will be on day 364, you need to know how position on the ellipse relates to time.

This is governed by Kepler’s Equation:

\[M = \kappa - e \sin(\kappa)\]

where:

  • \(M = \frac{2\pi}{T}(t - t_0)\) is the mean anomaly, which increases linearly with time \(t\)
  • \(\kappa\) is the eccentric anomaly, an auxiliary angle that determines position on the ellipse
  • \(e\) is the orbital eccentricity
  • \(T\) is the orbital period

The problem: Given a future time \(t\), you can easily compute \(M\). But to find the position, you need \(\kappa\)—and solving \(M = \kappa - e\sin(\kappa)\) for \(\kappa\) cannot be done analytically!

(a) Is Kepler’s equation linear or nonlinear in \(\kappa\)? Explain why it cannot be solved analytically.

(b) Use Newton’s method to solve Kepler’s equation for \(\kappa\) when \(M = 1.5\) radians and \(e = 0.08\) (approximately Ceres’s eccentricity). Start with initial guess \(\kappa_0 = M\). Report your answer after 5 iterations.

Hint: Newton’s method update is \(\kappa_{n+1} = \kappa_n - \frac{f(\kappa_n)}{f'(\kappa_n)}\) where \(f(\kappa) = \kappa - e\sin(\kappa) - M\).


Final Reflection: In 2-3 sentences, explain what Gauss gained by combining Kepler’s laws (elliptical orbit + time-position relationship) that polynomial fitting alone couldn’t provide.

Problem 3: Galileo GalilAI (30 points)

In 1602, Galileo started conducting experiments on the pendulum that ultimately led to his discovery of the relationship between its period and length:

\[T = 2 \pi \sqrt{ \frac{L}{g} }\]

Imagine being in his shoes, except with and smart and machine learning at your disposal. This is an open-ended problem whose purpose is to get you familiar with data collection, pre-processing and linear regression.

Download an app on your phone that allows you to save sensor data. I recommend the physics toolbox sensors suite app which seems to be available on both \hrefiPhone and Android platforms. Hang your phone at the end of a string and use it as a pendulum. You can use your charger cable as the string but you might want to put a cushiony surface under it in case the phone falls off the cable. Perform pendulum free oscillation experiments (as Galileo probably did) with different string lengths. Let \(x^{(i)}(t)\) be the sensor measurement for every experiment \(i\) with string length \(L^{(i)}\). \(x^{(i)}(t)\) can be the angular acceleration if you are using the gyroscope, or the linear acceleration if you are using the accelerometer. We are interested in predicting the period of oscillation \(T\) from the length of the string \(L\).

(a) Briefly describe your setup and data collection method. For example, how did you measure \(L\)? How many data points did you use?

(b) Data cleaning: remove pre- and post- oscillation measurements from your sensor data and plot the time series for each experiment \(i\) (you can do so on the same plot or on separate subplots). If you are using a gyroscope that truncates values between \(0\) and \(2 \pi\), process the data to get a sinusoidal looking time series.

(b) Transform (algorithmically, not by hand) the time series data \(z^{(i)}(t)\) to the an average period of oscillation \(T^{(i)}\) for each experiment \(i\). Briefly describe the method you used. (hint: what’s the best way to find the frequencies in a time-series data?)

(c) Split your data into a training and test sets, and plot \(T^{(i)}\) as a function of \(L^{(i)}\) for all data points \(i\).

(d) What is your input \(x\) and output \(y\)? Suggest a few choices for the feature extractor \(\phi(x)\) and write down \(f_\mathbf{w}(x) = \phi(x) \cdot \mathbf w\). Explain the motivation behind your choices. (hint: what happens if you \(log()\) the data? What happens if you use a polynomial of degree higher than \(2\)? Discuss these issues in the next question.)

(e) Depending on your choice of \(\phi()\), \(x\) and \(y\), use linear regression, ridge regression (with \(L_2\) regularization) or Lasso (with \(L_1\) regularization) with scikit-learn and compare your results to the theory. What did you obtain for \(g\)?

(f) Briefly describe whether the data points you collected were enough and if you had to iterate to get more measurements.

(g) use LassoCV to optimize over the hyperparameter associated with the magnitude of the \(L_1\) regularization term in the Lasso loss function.

Problem 4: Research Question (10 points)

Write a proposal for discovering a new empirical law in your field of interest using machine learning. Your proposal should include the following:

  • A description of the empirical law you want to discover.
  • The type of data you would need to collect (features and labels).
  • The machine learning techniques you would use to analyze the data.
  • The physical constraints you would impose on the model and some ideas on how you would do it.
  • At least 5 references to scientific articles that support your proposal.

Hint: You can look into scientific articles that use machine learning to discover empirical laws. For example, this article uses symbolic regression to discover physical laws from experimental data.