A Beginner’s Checklist to Starting Your First Machine Learning Project

A Beginner’s Checklist to Starting Your First Machine Learning Project

Thinking about setting up your first machine learning project and don’t know where to start? This beginner’s checklist will walk you through a step-by-step thought process to get you started!

☑️ Step 1: Get a feel for what Machine Learning is all about 🙂

I’m assuming you arrived on this blog because you’ve heard of the concept of Machine Learning (ML) and Artificial Intelligence (AI) and watched a couple of videos here and there!

If you haven’t done so already, you can explore more through watching some cool TED talks here.

You can also explore an online course – there are many free courses available. You can check out this one by Udacity on ‘Intro to Machine Learning’ 🤖.

Don’t worry about writing the code yet, just get a feel for what’s happening in the Machine Learning world. Machine Learning is a concept within Artificial Intelligence (AI), as AI covers many fields. Here’s a fantastic blog if you would like to explore more about ‘Machine Learning vs. Artificial Intelligence’.

Ok, onto the next step! 😊 Don’t worry about achieving something perfect the first time round, the best way to learn is to get stuck into a small project.

☑️ Step 2: Try out a small project!

First things first, follow a tutorial to help you get started! Build something small to begin with and ask questions like:

  • ‘What does the data source look like?’
  • ‘How is the data being formatted?’
  • ‘What is the function of the code?’
  • ‘Why is this line of code here?’
  • ‘How is the machine learning model working?’
  • ‘How is this implemented in the code?’

This tutorial from Scikit-Learn is a good starting point to help you to get stuck in. The example shows how Scikit-Learn can be used to recognise images of hand-written digits.

Experiment! Try changing the type of classifier and performance metrics to see if this makes a difference to the ability of your model to identify the handwritten digits.

Congratulations! 🎉😎 You just built your first machine learning project! Take it easy ok, there’s a lot to take in already.

☑️ Step 3: What’s the problem you’re trying to solve?

Once you tried one or two example projects, you can start to tackle your very own one!

Here are some questions to help you:

Is Machine Learning the right approach for your project?

Sketch out some ideas on your notebook and refine your idea. What questions are you trying to answer? What is your goal? Start small! Machine Learning may or may not be the right approach for your project, so before you invest a lot of time, share your idea around to sense check it is right for you.

Are you trying to work with images? Are you working with numerical data?

Understand what kind of data you will be working with – this will guide you towards the appropriate solution for your problem.

☑️ Step 4: Data Acquisition and Understanding the Dataset

Where are you going to get your dataset from?

Before you can build the Machine Learning model, you need access to a dataset. For all projects, data acquisition is a very important step.

How big is your dataset? Is it the best dataset for your project? Are there issues with the data?

Delve into your dataset; understand it’s structure. What is the format of the data? What are the key features of the dataset? Which parts of the dataset do you want to capture? Which bits are relevant? Is your dataset big enough?

N.B. You may not need all of your dataset. Be aware of biases in the dataset sample itself!

Wow! That’s a big step out of the way, now onto choosing your model. 🙂

☑️ Step 5: Which modelling approach is suitable for the domain you’re working in?

Are you going to let the model learn by itself (unsupervised learning), or are you going to guide the ML training through (supervised learning)? Hopefully from the previous steps, you should have a jist of the problem type. Is it a classification, regression, clustering problem or something else?

Here’s a cool Machine Learning Map to help you decide.

☑️ Step 6: Data Processing and Formatting

Ok, data is never in the form you want it to be…there will be some data processing and formatting to get the data in a form that’s suitable for your machine learning project.

☑️ Step 7: Machine Learning

There are so many options out there. Best to explore for yourself and pick what rocks your boat 🚣. Tensorflow and Keras is a good combo, as well as Scikit-Learn 🙂 There are pros and cons for the technologies you choose. If you want, you can even set up an online coding notebook like CoLab notebook 📔 (pretty much a Jupyter notebook for the Python fans out there), so you can experiment a bit. Did I mention you can run your machine learning using a GPU for super speedy stuff?

If you want a quick run down on the techniques of Machine Learning, check out the crash course from Google.

☑️ Step 8: Data Splitting

Once you have your dataset ready, a consideration is splitting your dataset into a training and a testing dataset. The training dataset is the dataset your ML model will train on; your testing dataset is the dataset your model will be tested against to check how well the model performs.

Top Tip! It is important to randomise the dataset before you split it, so the order of your dataset doesn’t have a major impact on the model training process.

There are many mathematical approaches to measure model performance; but it is important to be aware of model overfitting. This is when the model is too reliant on the data and biased to the training dataset.

The rule of thumb for proportions is generally 90% of the dataset for training / 10% of the dataset for testing, but we have also seen 75% / 25% splits as well as 80%/ 20% splits.

☑️ Step 9: Model Training

Model training is the official term to mean “Run the Machine Learning model LOL! It’s about time!” All the hard work so far has paid off! You are ready to train your model! Good luck! 👍

Here is a non-exhaustive list of the things you may want to consider:

  1. Where are you going to do the model training? If your dataset is massive, you may consider how long the training process may take.
  2. Consider doing test runs on a small sample of your dataset to check that your model can actually train! Seriously, you don’t want to be waiting around for ages and come back to find that there were bugs in the way you interfaced the data to the machine learning model! (Been there and done that LOL 😭)
  3. How many times is your model going to run through the training dataset?

☑️ Step 10: Model Fitting & Model Tuning

Once you have a trained machine learning model, check how well it performs by testing it against a test dataset (a fancy way of saying the “data your machine learning model has never seen before”).

Have a think about how you measure the model performance.

Here are some strategies to improve the performance of your machine learning model, beware of overfitting of course!

  1. Go back to the data source! Is this the best data source for your model? Is there any pitfalls to your selected dataset. If not, maybe you can increase the sample size (how much data you’re using).
  2. Try choosing another machine learning model algorithm and do a exercise to see which one yields the best result
  3. Play around with the proportion of data you set aside for training and testing
  4. Refine the training process: see if you can increase the number of times you run through a dataset, although this will slow down the training process

Final Thoughts

You totally rock! Give yourself a pat on the back! Congratulations on doing Machine Learning 🎉🎉🎉🎉🎉🎉🎈🙌

Byeeeeeeee,

Kim

We did it! Top 5 Reflections – Machine Learning Final Project @ Makers

We did it! Top 5 Reflections – Machine Learning Final Project @ Makers

The Final Project at Makers

For the final project at Makers, I chose Art/Music AI as my topic of choice. I was assigned to a team called ‘AJAK’ to build a project of our choice.

For our project, we ended up using a Convolutional Neural Network Machine Learning model to classify doodles. The aim was for the user to input a doodle and the model outputs a prediction on what the user has drawn. In our app, the user can draw a camera, crown or rabbit.

We all came into the project with no/little knowledge on Machine Learning. We only had 1 1/2 weeks to complete the project, so it was a big achievement for us when we delivered our product on Demo Day!

You can check out our repo on Github!

Check out our app here: https://ajak-doodler.herokuapp.com/

AJAK Doodle App
AJAK Doodle App

We’re on Social Media!

If you missed the action, don’t worry! You can catch up via LinkedIn, Twitter or Facebook.

Check out the LinkedIn post

Here’s the Twitter post

We did it!!! 😍 @makersacademy thank you all, it’s been a blast and a great experience. Had so much fun on the group project #MachineLearning #Python #ArtificiallIntelligence #agile https://t.co/wtzT9HINOT

— Kim Diep (@thekimmykola) May 24, 2019

Missed the May 2019 Demo Day event @ Makers? You can watch the presentations on Facebook.

What’s it like to do a Machine Learning Project?

Here are my top 5 reflections:

#1 Machine Learning is flipping awesome!!!

I went into the project with some theoretical knowledge on Machine Learning, but no implementation know-how at all. Within 10 days, I fell in love with deep learning technologies and now feel equipped to do my own projects!

#2 Data acquisition and processing was a key part of the project

Even before the model can be trained, there was a lot of decision-making on where to get the data from, what the format of the source data was and data exploration to explore what was possible given the dataset. Data processing was important to get the data into the right format for our model.

#3 Building in Research & Development (R&D) time at the start of the project paid off

Given little team knowledge on Machine Learning, the first couple of days was spent on research. Whilst the other teams were putting code down, we hadn’t produced any code yet. This didn’t matter, as we took on a challenge and stuck to our team goals.

Personally, I learnt a lot from exploring a classification problem using the Handwriting MNIST dataset (literally the ‘Hello World’ of Machine Learning) and doing some crash courses using online tutorials.

We learned together as a team, used the whiteboard to break down our problem and made sure every team member understood the domain and choice of model. We chose to use a Convolutional Neural Network (CNN) in the end!

Understanding Convolutional Neural Networks (CNN)
Understanding Convolutional Neural Networks (CNN)

#4 Re-grouping as a team was useful to make informed decisions

There were a couple of moments in the project where we had to make pivotal decisions on the pros and cons of the technical implementation and balancing against delivering our Minimum Viable Product (MVP).

Re-grouping as a team and diagramming ideas out made it easier to be on the same page and created the space for ideas to be generated and decisions to be made!

Deciding on our technical architecture
Deciding on our technical architecture

#5 Sharing the love for Agile!

Having daily stand-ups, retrospectives and valuing communication over processes helped us to apply Agile theory to Agile practice! This made our team gel a lot better and made our project more engaging to create with the end-user in mind!

Final Thoughts

We delivered a kick-ass interactive project!

Thank you to my team for the wonderful journey into Machine Learning! 🙂 You guys were awesome – a pleasure working with you all 🙂

Byeeeeeeee,

Kim