Timetrack: User Activity Prediction

Building a MVS for timetrack’s User Activity Prediction

In my previous blog, I shared some of my thoughts and proposals on how I would improve timetrack.io to solve some of the pain points I’m having with existing time-tracking technologies.

Since then, I have reached out to the chief developer of timetrack, who replied back, discussed timetrack & the feature request in more detail and provided a dataset I could use for User Activity Prediction.

In this blog, I will discuss the steps I made to build a minimum viable solution for timetrack’s User Activity Prediction.

Note: I decided to go with a solution that’s easy to share and interpret, rather than rely on advanced analytics tools like Machine Learning (Occam’s Razor, y’all!)

Lo-fi (Pre-launch) Minimum Viable Solution

Overview

Use Tableau and Flourish to build a minimum viable solution (MVS) for the user activity prediction feature on timetrack that would help users of the app get more accurate and more reliable time series reports through humans-in-the-loop interactive data annotation, specifically, semi-automated annotation of the user activity field upon detection of untracked blocks of time.

Problem Statement

Eliminate or reduce untracked blocks of time by providing a way for users to label, annotate or categorize untracked time slots easier and faster through prediction and recommendation of the user’s next activity or untracked activities based on each user’s previous history

Limitations

Cannot use any data other than personal data from a user who has given timetrack.io authorization to access their personal data to enhance their time-tracking experience in-app

Exploratory Data Analysis

Assumption: duration format is hh:mm

Summary statistics:

Mean number of seconds per Activity type is 8.3M

Average number of seconds per Activity type is 80.5M

Time Series Analysis

Activity types with peaks have always been Walk, Transport, Sleep, and Development, Q1 through Q4 of each year.

Activity types with peaks have always been Walk, Transport, Sleep, and Development, followed by Shopping, Cinema and Sport, throughout the years 2016 to 2021.

Digging into the activities with duration above average

From this graph, we can see that there has been a downward trend in duration since 2020. For this reason, let’s suppose that at present, the duration and pattern of user behavior will follow that of the year 2020–2021 (i.e. user’s most recent 2 years worth of data).

From this graph, we see that the order of the top activities based on total duration does not change significantly over the years. It’s safe to say that this particular user’s activity revolves around these 4 activity types.

User Behavior Analysis

24-hour Usage Behavior

Even on a 24-hour usage analysis, we’ll see that there’s a trend revealed for the Top Activity types. Trends revealed include the following (note that this graph includes year 2016 to 2021):

  1. Walk is dominant during hours 9 to 22
  2. Sleep is dominant between hours 0 to 2 and 21 to 24
  3. Transport is dominant during hours 10 to 21
  4. Development is dominant during hours 10 to 12
  5. Coffee is dominant during hours 10 to 13
  6. Sport is dominant between hours 9 to 11 and 16 to 19

From the 24-hour graphs, we can see that there are typical “active hours” when there’s non-zero duration for the top activity types. The trends for 2021 are the following:

  1. Development is dominant during hours 8 to 12
  2. Walk is dominant during hours 8 to 11
  3. Transport is dominant during hours 6 to 16
  4. Sleep is dominant between hours 0 to 1 and 21 to 24

From the 24-hour graphs where the Top 4 Activity Types are excluded (to reveal non-obvious insights), we can see that there are typical “active hours” when there’s non-zero duration for other activity types. The trends for 2021 are the following:

  1. Sport is dominant during hours 8 to 19
  2. Aloggers is dominant during hours 10 to 15
  3. Shopping is dominant during hours 8 to 21
  4. Cinema is dominant during hours 14 to 15
  5. Housework is dominant during hour 7

Weekly Usage Behavior

User’s behavior for top activity types appear to be uniform Sundays to Saturdays. Some interesting insights include the ff:

  1. Walk increases in duration Tuesdays and Wednesdays
  2. All 4 activity types have an average / median / upper and lower quartiles within the same range, for the most part.
  3. There are outliers in terms of duration for all 4 activity types, but even the outliers are within the same range, for the most part.

Solution

Approach #1: Recommend Top Activity Types based on Total Duration above Average

This approach is the simplest and easiest to deploy.

The steps include the following:

  1. Compute the total number of seconds for each activity type
  2. Sort descending
  3. Identify the activity types that belong to the upper percentile (or above average) based on total number of seconds (let’s call this “top activity types”)
  4. Recommend user to label “untracked” blocks of time given only the top activity types as options on a drop down whenever an untracked block of time is detected

Sample Use Case:

  1. Timetrack has detected an untracked block of time for user.
  2. Timetrack app shows a pop up (with a dropdown) to the user to label the untracked time as any of the following (options in the dropdown should be in this order):

2.1 Walk

2.2 Sleep

2.3 Transport

2.4 Development

Approach #2: Recommend Top Activity Types based on Total Duration and Time-of-Day

This approach requires that the user has more than 24-hours worth of personal data.

The steps include the following:

  1. Compute the total number of seconds for each activity type, over 24-hours.
  2. Sort descending
  3. Identify the activity types that belong to the upper percentile (or above average) based on total number of seconds (let’s call this “top activity types”)
  4. Recommend user to label “untracked” blocks of time given only the top activity types as options on a drop down whenever an untracked block of time is detected
  5. Recommended options on the dropdown will depend on what time range the “untracked” blocks of time belong to.

Sample Use Case:

  1. Timetrack has detected an untracked block of time for user.
  2. Timetrack app shows a pop up (with a drop down) to the user to label or categorize the untracked time as any of the following (options in the dropdown should be in this order), options shown will depend on which time range the untracked time belongs to.
decision table
User Activity over the years, a bar chart race created on Flourish

Insights

This project made me realize how subject matter expertise, business sense and domain knowledge play a crucial part in a data science project. The approach I used in building a MVS was based on my knowledge in social physics — where human beings are compared to foraging animals, in other words, human behavior, specifically day-to-day foot traffic, follows a typical pattern or routine and sudden drastic changes in the pattern signifies events “out-of-the-ordinary” such as when an individual has fallen sick, when there’s an outbreak within the area, or when natural disasters strike.

The “typical pattern” in this user’s case is the Walk-Sleep-Eat-Development routine. And as a data scientist, I can relate. So do my colleagues and most people I know living in the city and making a living out of working on projects using a computer. Furthermore, just as the foraging behavior of animals changes in response to the environment where the animal lives, the “typical routine” for people in rural areas, especially in Geographically Isolated and Disadvantaged Areas (GIDA), would differ significantly from the routine of people living in urban areas.

Knowing that most humans follow a typical weekly pattern in terms of activities (work on weekdays, rest & recreation on weekends) and a typical pattern within 24-hrs (sleep-eat-work-repeat), my approach was to look into the data in terms of weekdays and by hours. Without this knowledge, it would have taken more time to get to the conclusion. My experience in electrical engineering, finance, retail and other industries echo this knowledge in human behavior and seasonality. This reminds me of all the projects where the data confirms what we already know by way of domain expertise (or in this case, common sense).

PS If you are an undergrad Filipina from a low-income family and you’re trying to land your first job in tech or data, please reach out to me, and I’ll help you #GetThatBread 😉

titaofdata.github.io | Product | Data | IoT | Automation | Decision Science