In this post I am going to use the data available for download from Google to estimate how much I slept during the last two years: When I went to bed, when I got up, and the sleep-durations - and how this changed over time with the changes in my life. This can also be interpreted as 'Look how much data Google has on me'.

As a new years resolution, I decided to try to read more (in addition to the ungodly amount of papers). The first book on my list was ‘Why we sleep’ by Matthew Walker. (I am aware of the criticism, but it still seemed like a good starting point on the topic of sleep to me). I do know that my sleeping habits are not great (or even good), but I am still assuming that I do sleep enough - I am not entirely sure though. I do self-qualify as a night-owl, but often have to get up early due to crew-training or work-meetings, which are (my assumed) main reasons for the nights where I do not sleep enough.

Luckily, I produce a lot of data - and (at least partially) thanks to the GDPR in the EU, I can simply request most of that information from the companies that helpfully keep track of that data for me. For this experiment, I ended up using mostly data from Google1 (which was in a surprisingly parseable json format) and a few WhatsApp chat histories.

1 The data that I ended up relying most on was from Google MyActivity - you can download yours here, and follow along. Note that I did not opt for a full download, but only the data that I thought would be helpful for my goal - figuring out my sleep. This resulted in: My Activity, Location History and YouTube.

Technical details

Data Examples

The data obtained from Google is in json, and perfectly timestamped (it does however not take daylight savings time into account, compared to the whatsapp-timestamps):

{
  "header": "YouTube",
  "title": "MEUTE - Customer is King (Solomun Rework) angesehen",
  "titleUrl": "https://www.youtube.com/watch?v\u003dndUMzZEiOZo",
  "subtitles": [{
    "name": "MEUTE",
    "url": "https://www.youtube.com/channel/UCY3cAFsquIk7VGMuk-V8S3g"
  }],
  "time": "2020-01-11T22:23:37.339Z",
  "products": ["YouTube"]
}

and the whatsapp chat histories2 is have timestamps in the following format:

22.10.19, 17:05 - Valentin: [Text]
22.10.19, 17:05 - Valentin: [Text]
22.10.19, 17:05 - [Name]: [Text]
22.10.19, 17:05 - Valentin: [Text]

2 WhatsApp data can be downloaded only one chat-history at a time, and at the time of writing, only usig the phone-client. There, the option is found in 'Settings' (in the respective chat) > 'More' > 'Export Chat'

Taking all of the data I used for my analysis gives the scatter-plot below

which contains about 140’000 datapoints.

Detection of sleep indicators

  • Detection of sleep time is simply the last active timestamp before no action is detected for an extended period of time.

  • For the wakeup time, I check if there is certain number of events occuring in a given timeframe - meaning that one interaction with the the phone/browser/whatever is not enough to be qualified as ‘awake’. This is due to the alarms that I regularly deactivate, and continue to sleep.

Both these assumptions mean that there must be a frequent interaction with any Google account, or texting someone right at sleep/wakeup. For me both these things are true - I usually read something on my phone/check my alarms/write to someone before sleeping, and do the same thing when waking up fairly quickly.

The following figure connects the resulting start and end times of the sleep using the indicators previously described:

The ‘sleep’ during the day are either times where I simply did not generate data, or times where I actually slept again after training (which did happen from time to time). The big gap in September ‘19 was from a trip to Mount Kilimanjaro where I did not have my phone with me.

In the end, the complete analysis relies on the assumption that one constantly generates some data, and extracts this - no knowledge of actual sleep happening is going on here.

Code

The script here produces a html file that gives a small aggregation of a few key indicators (average sleep time, distinction between weekend/normal weekday) and the possibility to specify custom ranges that should be analyzed separately. An example (the output of my sleep data) is below:

and can also be found here.

Hypotheses and analysis

I divided the data into three parts which mark distinct parts in my recent life: when I wrote my masters thesis, the time I was at BCG Gamma, and most recently the start of my PhD. I had the following hypotheses about the three separate times:

  • Masters Thesis at Verity Studios (09/18 - 04/19):
    • Wakeup times should be almost bimodal (with a peak around 5 and one around 9) due to early morning crew trainings, and me never really changing the time I got up on the other days to something more suitable to the mandatory trainings.
    • Wakeup times on the weekend should be earlier (around 7) than some the times during the week due to crew
    • In total, since I am usually more productive late at night (and this was the time where I worked on my Thesis), the sleep times are late, and my sleep duration is likely on the short side of 8 hours.
  • In Munich at BCG Gamma doing Data Science/Management Consulting (04/19 - 09/19):
    • I had early flights/train rides on Monday morning in both projects I was on (getting up around 5), this should be fairly visible
    • On the weekends I was very often working late on applications to PhD programs, and at some point also to regular jobs - this should lead to late sleep-times on the weekends.
  • My PhD (10/19 - now):
    • Wakeup on the weekend and weekdays should be similar (fairly consistent between 8.30 and 9.00) since crew here is only on weekends, and it is a bit later than in Zurich, but still early enough to lead to similar wake up times as during the week.
    • I am quite free in starting/end time of my work as long as I get things done, meaning that I should usually get around 8 hours of sleep - this routine is however sometimes disturbed due to meetings.

I also wanto to answer the question if I sleep ‘enough’ (lets define this as approximately 8 hours), and if there is a systems behind the days where I do not reach that ‘enough’ threshold.

How wrong are my assumptions?

Points 1-3 need an analysis on a day-level-aggregation for the time at Verity Studios:

Data distributions when aggregating day for the time at Verity. Note that the duration is the time from the sleep time of the previous day to the wake up time of this day.


  • Verity:
    • While some slight irregularity is visible for the wake up time around 5 in the first plot, the days with training are fairly obvious in the second plot: Tuesday, Thursday and Friday.
    • While on average not necessarily earlier than during the week, the peak of the distributions is definitely before some of the peaks during the week. The lack of a more extreme peak could be explained by the fact that early morning training only started halfway through the thesis, diminishing the effect.
    • Partially true - for both the weekends, and weekdays, the average duration is close enough to 8 hours. However, for both times of the week, it seems like there were days where sleep was definitely sacrificed for training and/or early morning meetings.

The next hypotheses benefit from comparisons between the time-slices, and rely on more aggregated views of the data. In the image below, I plotted a few distributions regarding the start and end time of my sleep, and the corresponding sleep-durations for all time-slices split into week and weekend.

  • BCG:
    • This is definitely true, and also visible in the plot above. It is also backed up by the day-level plot here.
    • While not extremely visible in the aggregated plot, the sleeping times are definitely later than during the week. (Which says quite a bit, since the times are already bad during the week.)
  • PhD:
    • The wakeup times during the week and on the weekend are on average fairly similar. I seem to be getting a little bit less sleep from Friday to Saturday, but in general, the plots do not differe too much here.
    • I seem to hover around 7.5 hours during the week, and a tad more on the weekends.

Conclusion

Overall, I was right with my initial assesment, and with most of my hypotheses - although the amount of sleep I currently get seems perfectly fine. It possibly is a bit on the shorter side, but given the fact that I usually wake up naturally even during the week, it seems like my body is content with the amount of sleep I get. The fact that I do not sleep much more on the weekends compared to weekdays supports this.

Through this project I actively checked and became arware of how much data I actually produce, and is stored about me. Will that change any of my habits? I do not think so - but it is still nice to come to that conclusion consciously, instead of simply accepting it without having all the information.

Outlook

There are still a few things that can be done with the dataset regarding the topic of sleep. I collect some of them below:

  • Show first interaction vs. computed wake-up time
    This would likely frustrating for me personally, since I am usually optimistic about all the things I am going to get done the next day = starting with waking up early. Reality usually hits with the first alarm that is going straight to snoozing.
  • Implementing a different approach for the computation of the wakeup time.
    A straightforward way to get a better feeling for the wakup times would be a probabilistic treatment all the way through - it would be possible to define interactions that are giving me more certainty that I am awake, and I would (over the course of a few timestamps) become eventually 100% sure that I am actually awake.
  • Different visualizations usually reveal other (interesting) things - happy to take suggestions here!