Skip to main content

I Like Big Data and I Cannot Lie

News story

Big data, it’s become a buzzword, but what is it exactly? And why does it matter?

A graphic demonstrating big data

Big data, according to Google, is defined as extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions, but also extending to physical data.

In today’s technology driven world, where smart devices are a dime a dozen, big data plays a critical role in improving services and allowing us to make smarter, informed decisions – in anything, be it in business, healthcare, the arts or for community engagement. It can come in the form of images, video, 3D data, text, social media log data and geo-data (i.e. geo-tags).

It forms the backbone of concepts such as the Internet of Things, and its new incarnation, the Internet of Everything, where data is continuously streamed and analysed from multiple devices, create a better workflow, increase efficiency or to reach a specific audience at just the right moment.

We see the effects of big data in small ways everyday, like Google Now notifying me that there’s an Age of Ultron screening tonight at 6pm and, with traffic density determined by the number of smartphone users on the route to the cinemas, I need to leave by 5:42pm to get there on time – all generated because I watched the trailer on YouTube last night.

Related: From little things, the Internet of Things grows.

The Big Bang of Data

The International Data Corporation (IDC) predicts that by 2020 the digital universe will reach 40 zettabytes (ZB) or 40,000,000,000,000,000,000,000 bytes. To put this into scale, that’s 57 times the amount of all the grains of sand on all the beaches on Earth.

When last measured in 2012, the digital universe was at 2.8 ZB. From 2010 to 2012 our digital universe doubled, and will continue to double every two years between now and 2020 – which is close to a 50-fold increase since 2010.

Big Things in Small Sizes

Despite the volume of data we are generating, the IDC estimates that only 0.5 per cent of the world’s data is being analysed, even though 23 per cent of it could be useful if we could locate it and examine it.

But even with 0.5 per cent, that’s still a lot of data. How do you possibly make sense of all that? How do you use it and what can you learn from it? And how do you communicate it to the layman audience?

Curtin’s Hub for Immersive Visualisation and eResearch (or HIVE for short) is a good place to start. The HIVE was built from the recognition that there was a growing need for new and improved tools to interpret, present and communicate big data.

It features four large-scale visualisation systems, that allow the visualisation of different types of data, in varying volumes, scales and formats. With the aid of technical staff, the HIVE has become a space where researchers can visualise their data to achieve greater insights and increase the usefulness and relevance of their research.

Related: Researchers abuzz over HIVE and immersive visualisation.

Seeing the Bigger Picture

For Associate Professor Kay O’Halloran and Dr Sabine Tan of Curtin’s School of Education and the Curtin Institute for Computation, big data offers an opportunity to examine and develop our digital techno-culture.

Digital techno-culture investigates the relationship individuals, groups and societies have with technology. In the case of O’Halloran and Tan’s research, it also involves investigating how online media, social media and ‘smart’ technology are changing the way we communicate – and in turn changing our society and culture.

Their research, self-described as a bridge between science and humanities, takes the scientific analyses of big data (such as when and where you’re most likely to take an Instagram “selfie”) and views it through the lens of human culture to understand how and why people are shifting away from text-based communications to use more image and multimodal (images, video and text combined – think Snapchat and GIFS) forms of conversation.

“We are trying to bring in humanities-centred discussions into big data,” O’Halloran explains.

To quote a favourite idiom of the internet: science will tell you how to clone a T-Rex; humanities will tell you why this might be a bad idea. Big data can tell us how, where and how much we use social media; but humanities will teach us why and what it might mean for us. Combine science and humanities together and you can begin to understand society as it rapidly evolves in the digital age.

To illustrate the potential of their research, O’Halloran and her colleagues analysed social media in Singapore, generating a map to visually represent Instagram and Twitter “hotspots” in the city. Details even went down to the types of photos posted and the sentiments of a tweet. Cross-reference this with other geo-tagged data (such as the Foursquare locations seen in the picture below) and they could determine what kind of activity users were up to.

Foursquare Locations around Changi Airport, Singapore (Credit: Alvin Chua).

Foursquare Locations around Changi Airport, Singapore (Credit: Alvin Chua).

“We’re really interested to see how people’s communication changes depending on their location and the social activity they do,” O’Halloran says.

Their research found that the Instagram selfie reigned supreme, with people taking photos of themselves and others no matter where they were or what kind of social activity they were involved in – be it business, entertainment or a just a trip to the local bar.

Data also showed that the Singapore community is much more discerning when it comes to Twitter. Unlike Instagram, Twitter posts changed according to where users were (and presumably the type of activity they were involved in), with tweets from the business district becoming formal and less emotive compared to those from entertainment areas.

Innovation and Creation is Big at Home

O’Halloran and colleagues have recently used the HIVE’s capabilities and big data analytics software Starlight Visual Information System (previously used by the US military to combat terrorism) to track public websites, Instagram and Twitter texts to analyse public online media and social media posts in the Curtin community.

Their findings indicate that “innovation” and “creation” themed posts in online and social media are common in the Curtin community. Like in the Singapore project, location data was used to find “hotspot” areas for these themes on the Curtin Bentley Campus.

Visualising public discourses (websites, Twitter and Instagram) at Curtin University.

Visualising public discourses (websites, Twitter and Instagram) at Curtin.

This research can form the basis for developing new data mining algorithms that are suited to analysing multimodal data and to find patterns within it, be it a form of behaviour, a cluster of related themes or ideas.

This is especially relevant to Internet of Everything research.

“Digital culture and technology allows us to interrogate, investigate and understand human social life on a scale never seen before,” O’Halloran says.

So by the time the third Avengers movie rolls around, I’ll not only be told of its release and current screenings, but that I should invite Friend A over Friend B, because Friend A’s online behaviour indicates they like superhero flicks, while Friend B prefers drama.

Best method of contact? Don’t bother with email – Friend A is out walking the dog and their average response rate is less than 10 per cent anyway. Try WhatsApp; response rate is at 80 per cent on this service. Added bonus: I’ll be told Friend A’s availability based on their calendar and what session times we’re both free.

This kind of communication could have implications for a range of services – from creating a better customer experience, to aiding emergency and disaster response in a community. So far it is just a glimpse of the potential big data has to give us insights into how whole populations of people communicate, how it is evolving and how it changes depending on the situation.

Not bad for just 0.5 per cent.

For more information: