Setting Up as a Data Scientist

Sweet! You just got a new laptop. What should you install?

turn photos into spreadsheet here

After working as a professional Data Scientist for some time, I was finally able to afford my own laptop. I only decided to finally purchase one because of the realizations I had during the ECQ. All of us inside our household require a laptop but we only have 1 laptop that is useful — and that is the laptop lent to me by our company. This means that we all have to take turns and wait for each other to be finished with their task before we could use it.

Since my mom is now working from home and I am quite sure that my little sister will be having online classes this year (while there is still no vaccine against COVID-19), I wanted to make sure we would have a family laptop.

My laptop currently has 8GB RAM and 256GB SSD. I was already sure that I needed at least 8GB RAM for the type of work I do (production-ready apps and analyses — for machine learning, AI, engineering and IoT) and I have already done my homework — comparing 8GB versus 16GB RAM (tldr; the difference is already not noticeable for mere mortals).

I also got a licensed Microsoft Office 365 free from our university subscription. And if anyone is wondering, yes, I bought this laptop secondhand and met a French guy who got stuck in PH because of the ECQ. He said his reason for selling was that it was difficult for him to get used to the QWERTY keyboard layout because in France, they have the AZERTY layout.

Plus, I have to be honest — as a Hamilton fan, I was extremely charmed to hear him count the bills I gave him… in French! (Un, deux, trois, quatre, cinq, six, sept, huit, neuf….)

Installations

Here are the things that I installed to my computer upon having it on-hand (all of these are FREE, OPEN-SOURCE!)

  1. Anaconda — Individual
    This software is free and already comes with the packages that most data science professionals use like Spyder, Jupyter, pip, etc
  2. Git
    For most of the projects I have done, I cannot count the amount of git clone I did to explore how I could execute the project in the fastest way possible (in compliance to the First Principle of Scrum). In fact, most of the commands on my cmd prompt are pip install (some package here) and git clone (some repo here).
  3. VS Code, Sublime Text 3, Kite for Python: IDEs and plugins for Software Development

Out of the 3, it’s VS Code that I used most often. Next to Jupyter notebooks & Spyder.

Very recently, I have decided to commit to #100DaysOfCode challenge in my quest to become a better Data Scientist by going full stack and learning the ropes of Cloud Development. At present, the company I work with is using Google Cloud Platform while the non-profit we support is on its way to integrating AWS into the curriculum. Either way, cloud development is now essential in my career. Exciting times!

  1. DS Tools
    Here’s a list of the pip install commands I ran to obtain the data science toolkits I have used very often in building data science projects in the past:

pip install streamlit
- I use this when I want to deploy a web app with a machine learning backend locally on my computer or on Heroku. Personally, as someone who has used Flask to build web apps in the past, using streamlit for machine learning web apps is way faster and easier, more intuitive for beginners (you don’t need prior experience with HTML, CSS or JavaScript to use it to build a web app)

Even though I have been using Google Data Studio recently since most of our data sources are connected to google services (e.g. Google Sheets), other awesome viz tools that startups and large corporate companies are the following:

  • Tableau — there’s a free version of Tableau and that’s Tableau Public. Just be reminded that all the viz on your Tableau Public account is accessible to the public (hence the name).
  • Flourish — another awesome viz tool, similar to Tableau. It’s online and it’s pretty intuitive to use.
  • PowerBI — this is a free viz tool from Microsoft, and I’ve seen a lot of companies use this for their recurring reports (e.g. monthly KPI reports). In terms of user-friendly / intuitive interface, I will say this: PowerBI is to Adobe Photoshop as Tableau is to Canva.

5. Stand-alone Data Science Software

RapidMiner (for ultra fast end-to-end ML), MicroStrategy Desktop, Weka, Trifacta (for data wrangling)

6. Database and Reporting Tools

My sister uses MS Office apps for most of her homework (I’m surprised with how different gsheet or gslides compares to local MS Office apps) so this is still an essential for us. For this I installed Microsoft Office 365 and I was able to have it activated by logging in with the credentials given to us by our university. If you are currently enrolled in a higher ed institution, ask your university’s computer helpdesk about it.

For my spreadsheet needs, I am shifting from Google forms to AirTable. Besides it being my boss’ preference, I also find its Customized Views feature very helpful. It can make a row look like a form when filling out. In that way, data entry becomes easier and AirTable makes adding input to a database with a million rows still pleasing to the eyes.

I’ll update this as I go. Ciao!

titaofdata.github.io | Product | Data | IoT | Automation | Decision Science