“Success is not final, failure is not fatal: It is the courage to continue that counts.” – Winston Churchill
Around early 2020, while finishing my graduation thesis, I decided to switch to the field of Data Science. For context, I have a degree in Mechanical Engineering, but my thesis was basically a statistical analysis of a vehicle’s emissions. And then and there I fell in love with statistics, I absolutely needed to know more. Around that time, I found out about the Statistics and Data Science MicroMasters program from MIT. I looked at the courses and I thought it’d be good to build a solid foundation in statistics. A start to being able to actually work in the field, so I enrolled.
I took all the courses required from January until December 2020, and the last elective course from February to June 2021. This meant that I’d be taking the Capstone Exam on the fall run, around September 2021. But, around May of that year, a friend called with a proposal to work on a project regarding the ventilation of indoor areas in the midst of the COVID pandemic. I started working there, and when an email came letting me know it was time to enrol for the Capstone Exam. And, I’m aware that I’m not a great tester, which means I normally need to dedicate more time to prepare.
So I postponed the enrolment again and again, and just like that, two years passed. In January 2023 I was again between jobs when I received a new email, enrolment was open and the exam was in a month. Completely scared, I clicked through the registration and payment faster than fear. Once the money was in, I had to do it.
This is the story and reflections after I finally got my MicroMasters program completed. Also, if you want to see the list of resources I used to prepare, check out the list at the end of the post.
Overcoming the Gap
I’d love to say that I started studying right away, and went into the material with fury… I didn’t. The sole idea of opening the courses got my nerves going. But after a week of procrastination, I got around to it. I had a month, and when I enrolled it seemed like a long time to get ready. Once I opened the material, I knew it was not. I started going through the homework exercises and was actually astonished at how much I had forgotten. Things that used to be second nature, I had no idea how, or what to do. These two years were beginning to show.
I needed a strategy, for filling the gaps and making the cheatsheets. Wait, before I get ahead of myself here, I’ll explain a bit more about the exam. The evaluation consisted of four proctored exams, taken in two parts, that cover the content of the whole program. For each part, we were allowed 4 pages of cheatsheets that needed to be uploaded beforehand. What did I do? I remembered my years of engineering school and began to prioritise the material (80/20 people). I know is not ideal, but I was very short on time, knowledge, and maybe a bit desperate.
First I read the exam instructions thoroughly, and it turned out that the topics evaluated were spread across courses. Meaning, I was not tested by course, but in a more integral fashion. That meant I had to fill all my knowledge gaps. So I decided to take the “biggest first” approach. For me, that meant I’ll spend most of my time reviewing Statistics, covered mainly in the Fundamentals of Statistics course. I also knew I had trouble with Machine Learning, but that course felt a bit more like walking in the dark, so I decided to leave it for later since it was tested on Part 2 and I had a week between tests.
Now, the cheatsheets, this is where I took a different approach. I would normally just look for the formulas and copy them straight. But this time (and especially after the first test), I wanted to understand thoroughly the material and then copy whatever I thought I would forget. So yes, they were formulas, a lot of them, but I also put concepts and graphs in the cheatsheet that would serve as a trigger to remember what to do. As a side note, I use Notability because I like my notes handwritten.
Studying for Part 1
First I went through the list of topics and chose to tackle first the ones that gave me the most trouble the first time around. The biggest was Hypothesis testing, I couldn’t for the life of me, get around the main concepts, calculating errors and p-values broke my brain… so much frustration. Also, I prefer reading to video, so I also started to use the recommended book from the course: “All of Statistics” by David Wasserman (can’t recommend it highly enough). I can confidently say that the book saved my life. Because after two years of forgetting, that book just made everything click.
I copied the main formulas on the cheatsheet subject by subject and after a week and a half of going through the topics I switched to Probability. At that point, I arrived at the part of the Statistics course that covered Bayesian Statistics, and I knew that those topics were covered more in-depth in the Probability course. I used, “Introduction to Probability” by Bertsekas and Tsitsiklis (Nice and light) as a book reference to study and make the Probability and Bayesian Statistics cheatsheets. That settled me for the first part of the test, and I made the rule that, as soon as I uploaded the cheatsheets for Part 1, I had to start with Machine Learning.
Machine Learning & Part 2
That was a different kind of ordeal because for Machine Learning there’s the theoretical side and the practical aspect of it. I focused first on the theory, and since my knowledge from the course was more fuzzy and all over the place, I started at the beginning and went in order.
- Classification
- Regression
- Deep Learning
- Unsupervised Learning
- Reinforcement Learning
For reference, the PDF notes from the course Introduction to Machine Learning available on MIT’s Open Learning Library were invaluable. And, in the case of Deep Learning, the videos from StatQuest by Josh Stammer on YouTube actually helped me understand what was and how it worked. There is also this Medium article on how to determine how many Neurons and Layers you need, that demystified the whole thing for me. As for Reinforcement Learning, I actually rewatched all the videos and was surprised by how easy it was to understand.
Small sidenote here, I’ve just found out about a new book by Andrew Glaser called “Deep Learning, a visual approach”. And it is a great resource if you want to know the concepts surrounding Machine Learning, without all the math. I personally have difficulties with math, and sometimes understanding things abstractly, before putting in the mathematical representations makes it easier for me to grasp the material.
Embracing the Growth
Now, after taking and passing the test I still can’t quite believe it was possible (I honestly didn’t think I’ll make it). And it sounds nice to say it like it was some sort of a hero’s journey with a predestined victory, but then I remember how brutal the study sessions were. Sometimes I would go to bed feeling like my brain was being thrown under a waterfall of knowledge. This leads me to my biggest lesson from this whole experience: Learn for curiosity, learn to actually understand something.
And I know it’s a hard thing to do, especially when pressed with exams and deadlines. But, I guess this time away from academic life has changed my approach to learning (the irony). As a working professional, I now needed to learn new things, not just for a grade but to actually do something with it. That by itself made a huge difference. I also kept learning more about data science and building my own projects (check out this article on my latest NLP project). Which helped me improve my technical skill vastly. And it those skills made it easier to understand how Machine Learning worked. As a consequence, I wasn’t afraid of going into the code examples to see the codebase for course projects, as I had a better understanding of Python
and specially PyTorch
.
Lessons learned
What I mean to say with all this is that, if you are now in a position of having to relearn some dense and complicated material, approach it with curiosity. Go at it, even if you don’t think you can. If you are tight on time, Pareto (the 80/20 rule) is your best friend, but don’t kid yourself by thinking that it will yield the results of someone who has spent much more time building and understanding the material. Skim through the concepts, and get a feel for your gaps, fill in the biggest ones first even if they take you more time than you’d think (they will). It’s going to be difficult, and I know it’s all pretty much standard advice. But that’s how many things are right? Simple and hard. Although, if I’ve learnt something new from this experience, it was the meaning of interiorising knowledge.
That’s what the instructions say when you sign up for the test, and in that context, it means that I didn’t get straightforward questions like “Calculate the MLE in X and Y conditions”. Noup, it went further than that, to deliver an answer you needed to piece information from different parts of the program and know what you were being asked. This was especially true in the case of Machine Learning and Stats, which threw me out during the test. So, prepare to be surprised.
Key takeaways
As a whole, this was a very rewarding program, and it felt good to finally finish. It’s difficult and challenging, but you don’t get the feeling that it’s being overcomplicated for the sake of beating up the students. Which is something I’ve experienced before. And, even if you take some time away, it’s not impossible to pick up again. In fact, I think that allowing for the knowledge to sink in, is something that can work in your favour.
So get a feel for your knowledge, and then split the topics. Read the instructions well, and see what is going to be covered. Start with the hardest, it will give you confidence (it did for me). Also, if you are the type of learner that prefers reading, find good books and written resources (I made a list with mine below). And finally, just go at it. I was scared badly when I did it, my performance between parts is a testament to it (the difference is appalling). Although that might also be because towards the end I got better at soaking up the material.
Anyways, before I continue rambling. This is the end, I hope it was somewhat entertaining and overall useful. If you are studying for the test or any test, this is your cue to go back.
Remember what Churchill said, and happy studying!
Andrea
Resources
Bertsekas, Dimitri P., and John N. Tsitsiklis. Introduction to Probability. 2nd ed, Athena scientific, 2008.
Gad, Ahmed. ‘Beginners Ask “How Many Hidden Layers/Neurons to Use in Artificial Neural Networks?”’ Medium, 27 June 2018, https://towardsdatascience.com/beginners-ask-how-many-hidden-layers-neurons-to-use-in-artificial-neural-networks-51466afa0d3e.
A. S. Glassner, Deep learning: a visual approach. San Francisco: No Starch Press, 2021.
Introduction to Machine Learning. https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/about. Accessed 7 May 2023.
Neural Networks / Deep Learning. By Josh Starmer, 2022. www.youtube.com, https://www.youtube.com/watch?v=zxagGtF9MeU&list=PLblh5JKOoLUIxGDQs4LFFD–41Vzf-ME1.
Wasserman, Larry. All of Statistics: A Concise Course in Statistical Inference. Corr. 2. print., [Repr.], Springer, 2010.