Ideas from the book

‘Everybody lies’ by Seth Stephens-Davidowitz

If you are into data science, computational social science, social computing, big data or any other of the fancy terms for the power of data science applications on the large datasets we constantly obtain nowadays — then you will likely, as me, find interest in this book.

Seth obtained his PhD in economics. His ideas and the approaches to big data are so novel and out-of-the-box, at least from a perspective of a non-economist like me, that it is not a surprise to me that he has also held positions at Harvard, Google and New York Times.

Summary of the findings in the book:

Are Freudian slips real?

A simple answer is NO. The frequency with which people are found to make mistakes of the type fuckiest instead of funniest and a penestrian instead of a pedestrian while they type online turns out to be no larger than the frequency with which randomized bots would make such mistakes.

However, on another note, Freud might have been right… The Oedipus complex.

Is incest present in sexual phantasies?

In 9-16% (for women vs. men) of all the porn sites searches, such phantasies are present.

Another apparently random question explored is about successful basketball players. Is it really true that they are more likely to come from difficult family and neighborhood backgrounds?

Again the answer is NO. On the opposite, kids from mid or well-off families have higher chances of succeeding in basketball.

This question is my favorite and has nothing to do with digital big data as we come to think of them. What makes great racing horses?

Many have collected diverse data about racing horses for many years, including the parents, how they behave at certain times and about their different physical attributes. However, one man came to an idea to measure their heart sizes, in particular, the left ventricle. This turned out to be the single best predictor of most successful racing horses. Thanks to such finding he has convinced the owner of a later champion horse, the American Paharon, not to sell it away at a point when he intended to.



Social Networks

One of the fruitful fields within computer science today, Social Network Analysis (SNA), proliferated thanks to the many online social networks and active engagement of their users. Think of Twitter, Facebook, LinkedIn, Flickr, Swarm etc. SNA in particular enabled analysing some of the classical sociological theories within this new, online context. Hence, a synergy between sociology and computer science is asked for, and the new field termed Computation Social Science emerged. Our research belongs to this field.

Homophily in communication

Homophily is a tendency of similar individuals to connect. The famous saying illustrates is it simply: birds of a feather flock together. Homophily has been known already earlier in sociology, however recent SNA studies confirmed and quantified it on a large scale and in diverse settings.

We investigated homophily in Twitter communication on the basis of semantic features of users’ communication content, and also based on their social status in the Twitter network.

  1. In other words, for semantics, we asked, whether users who in general talk on similar topics talk more to each other. Then we also looked at specific topics, and measured for which of them this tendency is more pronounced.  We also found that users of similar sentiment talk more to each other.
  2. The question for social status is in simple words whether those who are more central and important in the social network tend to talk to others who are also more central and important. The answer is again positive.

While these results are expected from the knowledge from sociology from before, we extended the insights into the relationship between social status and semantic features of user tweets. In that regard, we find that the users who are more active and popular tend to use more diverse semantic content. At the same time, the most active users tend to have negative sentiment of their tweets.

A novel aspect of our study is that we investigated homophily on interaction links: based on mentions between users, instead of only following.  In this way, instead of one time and persisting links, we could assign the strength to the links and also we could define when they are formed or disconnected. Thanks to this approach, we found that for users to start communicating, it is important that their tweet topics are similar (value homophily); while, at the same time, the reason for disconnection of once an active link is more likely to be their status difference (status heterophily) than differences in topics.

Šćepanović S., Mishkovski I., Gonçalves B., Nguyen Trung Hieu, Hui P. “Semantic homophily in online communication: evidence from Twitter“, Online Social Network and Media, Elsevier, 2017 (to appear).

Smart Energy Grid

Residential setting: CIVIS

Most of the research in this field we conducted for smart grid in residential setting as part of CIVIS project. The main aim of CIVIS was to develop a social energy app to change energy practices in homes towards more sustainable.

Learning from others

First, we have performed a large literature review, in order to understand what has worked and what did not in the energy interventions conducted so far:

Lean, iterative and innovative approach

Alongside, we created mock-ups and conducted user studies in several of the partner universities, in the process of iterative, user-centered app design.

Originally, the name for the app was EnergyUp. At a later stage, we selected YouPower as a more appropriate name.

What we did at Aalto as part of CIVIS

The user study and the startup/innovation project we did at Aalto are described under YouPower on this website.

How we designed the CIVIS app in the end

There is also an external, CIVIS project link for the YouPower app.

The publication about the design:

Yilin Huang, Hanna Hasselqvist, Giacomo Poderi, Sanja Scepanovic, Filip Kis, Cristian Bogdan, Martijn Warnier and Frances M. T. Brazier, YouPower: An Open Source Platform for Community-Oriented Smart Grid User Engagement, in: Proceedings of the 14th IEEE International Conference on Networking, Sensing and Control, pages -, IEEEE, 2017.


Industrial setting: Green Big Data

During my visit at CERN, we also worked towards improving energy efficiency, but this time of a data centre. We received a dataset with energy consumption and computing statistics of a large computing centre: CSC — IT Center for Science. CSC provides computing services to the Finnish scientific community, but also to physicists from other countries, as it belongs to the Tier-2 of Worldwide LHC computing grid from CERN.

During this visit, I collaborated with:

We looked at the correlations between the application and system level logs and the energy consumption of the data centre. We clustered the computing nodes based on the vmstat and RAPL variables. Then we also showed that energy consumption on a node can be estimated from these variables.

Our results are accepted for Workshop on Energy-Aware High Performance Computing, EnA-HPC 2017.

Kashif Nizam Khan, Sanja Scepanovic, Tapio Niemi, Jukka K. Nurminen, Sebastian Von Alfthan, Olli-Pekka Lehto. “Analyzing the Power Consumption Behavior of a Large Scale Data Center“, Workshop on Energy-Aware High Performance Computing, EnA-HPC, June 2017.

Human dynamics

It is a privilege getting a chance to analyse the largest released mobile phone dataset for research community by that time. Data of Orange Telecom from Cote d’Ivoire are released for Data for Development challenge (D4D) in 2012.

So we asked: could such anonimized mobile communication (call timings, locations and person ids) serve as a socio-economic proxy indicator for the country? The answer is yes.

Mobility <–> communication frequency

For instance, from the averaged frequency and length of communication, one can well observe important events in the country, as well as correlate those with mobility of people.The black footprint shows mobility (calculated from calling locations). The red shows calling frequency and the blue, the duration.  We can see from the Figure 1 that the mobility and frequency correlate (we also calculated this to confirm). Interestingly, the duration of calls has a different pattern, and does not positively correlate with either frequency, nor mobility. Our conclusion is that people, when on the move,  communicate more in terms of number of calls, but when they want to make longer calls, they prefer to be in one place. Not that surprising when one thinks about it.Now, for all the 3 of the activities, one can identify easily New Year hours, Christmas, and Easter. Without previous knowledge, we could from those ananymized data find out that Cote d’Ivoire is a country in which religion is important (for a large part of its inhabitants).The graph (b) in the Figure 1 shows and averaged daily traveled distance and we have identified that the 3 peak periods match with December, 11, 2011 Parliamentary elections, then Africa Cup of Nations 2012 in football, where Cote d’Ivoire played in the final, and  Bouake Carnival and Fete du Dipri.

Fig1Figure1: Mobility vs. calling frequency and duration

 Economical status <–> radius of gyration

Another interesting finding is that a relatively simple quantification of human mobility, such as radius of gyration, can tell us a lot about different regions in the country.

This African country has its economic and development center in the city of Abijan, on the south east on the coast, while the northern and western parts are less developed, and on many indexes considered poor. Radius of gyration measures how far on average people do travel (very simplified interpretation, but serves our purpose). On the map (a) in Figure 2, we use the darker color for the regions in which people have a larger radius of gyration.

Now, it is apparent how the people in and near Abijan have relatively low radius of gyration, showing that they do not travel too far from their home location, and  people from the poor regions have considerably larger radius of their trajectories. That is because they do not have all the necessary services (hospitals, schools, ports) nearby, and they need to travel further and more frequently to fulfill their basic needs.  Moreover, it is rather clear how the whole country seems to gravitate towards the wealthy south-east coast and Abijan.

In the graph (b) on the right in Figure 2, we have averaged the radius of gyration for the 3 regions:

  1. Wealthy Abijan,
  2. Poor North-West,
  3. The whole Cote d’Ivoire.

Our aim is to show that this measure of human mobility, which is otherwise shown to be consistent over time for one country, differ in different regions, serving as an economic status fingerprint.

Fig5Figure2: User radius of gyration statistics

Administrative units and economic centers <–> commuting

Finally, to my own surprise,  using only the communication data, we were able to find the home and work locations for users, and based on those to calculate the the commuting network. Applying one of the common network partitioning algorithms (modularity detection), we were able to identify regions of commuting, that incredibly well match with with the administrative regions in the country. And for those areas where the obtained commuting regions do not match (mostly red, blue, green and cyan), we can easily identify the reasons: the borders are distorted by the economical centers (Abijan, Yamoussoukro, Gagnoa) that attract commuting.

On the left map in Figure 3, we show the important economic centers, that are identified after we have run a standard PageRank algorithm on the commuting network.


Figure3: Regions and centers of commuting importance


While this work shows a lot of ideas obtained based on what we already know about this particular country, there are at least a few points that amazed me-lover of data analysis and convinced me of its power:

  • When we call is different to how long we call.
  • How we move has to do with how wealthy we are.
  • We are free, but are we aware that we still move a lot under some invisible constraints?

For the rest and more details of our analysis, you can have a look at our PLoS ONE article.

Šćepanović, S., Mishkovski, I., Hui, P., Nurminen, J.K., Ylä-Jääski, A., “Mobile Phone Call Data as a Regional Socio-economic Proxy Indicator,” PLOS One, 2015.


Our teams at total 12 partner institutions, including Aalto, TU Delft and KTH, are working together on the EU project CIVIS, to design a social smart energy application and platform.



On 13th of June 2015, EnergyUP will participate in the Helsinki OSCE days. OSCE days is a global event devoted to Open Source Circular Economy, happening in more than 30 cities all over the world, and we are happy and privileged to take part in it!
We are preparing a showcase demo and a user study for everyone interested to help us make this app useful for you.


CIVIS team during the OSCE days Helsinki 2015:

  • Rasmus Eskola
  • Srinivaasan Gayathri
  • Yuri Barssi
  • Andrea Vianello
  • Sanja Šćepanović

Sign up here!

In May 2015, we are selected for the finals of the Climate Launchpad competition in Finland. Among 12 finalists, we will pitch at the Finnish finals event on 8th of June, for taking our idea to the next level — the EU competition round.

In March 2015, we won the 2. place award with our solution in the behavior change category in the first Aalto Energy Efficiency competition.

About EnergyUP

Our work is focused on developing a social network solution (an app) for helping people to become aware of their energy actions and decisions, by visualizing consumption and comparing with others. The idea is that behavior change is fostered by small, often common, actions and challenges.

Currently we are investigating the existing research and practical approaches and prototyping our solutions using the lean approach with user test iterations. Thus we have iterated over several versions of our prototype.

The latest version, merged with the prototype developed by our partners from TU Delft, Yilin Huang and Prof. Martijn Warnier, is now available through:  PROTOTYPEv2.

Aalto CIVIS team:
TU Delft team:
  • Yilin Huang
  • Martijn Warnier
KTH team:
  • Hanna Hasselqvist
  • Filip Kiš
  • Cristian Bogdan
Trento team:
  • Daniele Miorandi
  • Giacomo Poderi

We presented a poster on EnergyUP (renamed now YouPower) at ICCSS conference:

Huang Y, Šćepanović S, Miorandi D, Warnier M and Brazier F (2015). Towards Smart Grid User Engagement Through Social Networking. ICCSS International Conference on Computational Social Science 8-11 June 2015.