How I got into an Entry-Level Data Science Job

What I learned and how you could do it too

There are so many resources online on this topic. You probably went into that rabbit hole, and if you did, then most likely, you are now overwhelmed with all this information.

Here’s a tl;dr on how I got my foot in the door as an entry-level data scientist:

  • I worked on a project that has an impact on real-world problem and used machine learning in that project
  • But how did you know it has an impact on a “real-world” problem? I went through my network and asked if they knew anyone in an industry with a problem that can be solved by tech / data. In my case, my co-research affiliate introduced me to a medical practitioner from PGH and that doctor was the “user” who I reached out to, back-and-forth, to get feedback for the (ML-based) project I’m working on.

Feedback the doctor made were things like: this computation would make more sense, this feature should be the priority (instead of the other one), etc .

Here’s a verbatim of a feedback the doctor gave me about my project: “During a cardiac rehab session, some arrhythmia events are natural and non-fatal. So, it’s not enough that we classify: Is there an arrhythmia event or not? We have to be able to classify which arrhythmia event occurred (e.g. bradyarrhythmias; premature, or extra, beats; supraventricular arrhythmias; and ventricular arrhythmias). The doctor’s feedback helped me refine the problem statement (“the business problem”) and turn it into a machine learning problem (e.g. classify different types of arrhythmia events -> multi-class classification / supervised ML)

  • But how did I learn ML? Well, I did not take any formal courses in ML (because at that time, there was none for students in my degree) BUT, I did seek out a mentor (working in the same research lab where I did my thesis) and asked her a lot of questions
  • But how did she teach you? Well, she didn’t really spoon-feed me but she helped me curate the best resources to get started in ML
  • After consuming those resources, I just went ahead and got my hands dirty — I just “hello world” (through self study!) my way into learning Python, ML, Supervised Learning Algorithms, and more.
  • In other words, I refused to get stuck watching endless playlists of tutorials. I opened Spyder and tested out some of the codes I encountered that’s related to the project I’m working on. In other words, a lot of this on the command prompt:

>>> git clone [insert link to repo]

How you can get started with this:

  1. Think of a project / topic / issue you are really passionate about.
  2. Create an account on Github
  3. Search for that topic on Github (in my case, I was interested in using ML for arrhythmia detection)
sample

4. select a repository that piqued your interest

click on that repo then copy the repo’s URL

5. For windows users, press Start then type cmd

6. open the command prompt and type:

git clone [link to repo]

When it’s finished, you should see something like this:

Then, if you navigate to that folder (in this example, it’s C:/Users/admin2), you’ll find the folder to the repo that includes all scripts and README file.

Typically, what I would do is go through all the files, try and run the scripts. If I encounter errors, I go on stackoverflow to search for solutions to the error I’m getting. After I could successfully reproduce all the steps in the README file or in the researchers’ / developers’ white paper, I would go ahead and input my own dataset, or play with the hyperparameters, etc — in other words, I would break the code, re-engineer it, input my own data — play with it so I could understand what’s going on and how it works inside-out.

  • After practicing my programming skills and brushing up on ML concepts and statistics, I made sure I’m able to showcase this all on my CV. My CV includes the ff:
  1. the impact of the ML project I built (e.g. has 92% accuracy in detecting irregular heartbeats, reduces the time it takes for medical professionals to detect arrhythmia from 30 mins to 10 seconds)

2. How I built it (this is where the technical details would come in, this is where I was able to enumerate all the skills I developed and all the tools / tech stack I was able to explore and use)

3. What problems I encountered and what was my approach in solving these

  • The slide deck above from Amber Teng helps tremendously!

tl; dr work on a (personal) project, seek a mentor, update your CV to reflect the impact of your project, the skills you developed and the tools you used

PS will update this as I go, but, if you are an undergrad Filipina from a low-income family and you’re trying to land your first job in tech / data, please reach out to me, and I’ll help you #GetThatBread 😉

If you’re interested in finding out more about the data science interview process, here’s my previous blog on ACTUAL data science interview questions.

titaofdata.github.io | Product | Data | IoT | Automation | Decision Science