Data visualization – an essential element of presenting findings

Data is, in essence, a lot of numbers. They mean something only when put together and analyzed. However, the analysis can be lost on someone who is not intimately familiar with the data and/or how to read the analysis. It is here that data visualization helps.

I came across this interesting post on about data visualization at the turn of the century: How W.E.B. Du Bois used data visualization to confront prejudice in the early 20th century

I believe it goes to show the power of data visualization and how it can be used to show uncomfortable truths in a simple and straightforward manner that is still very effective.

Canadian billionaire bemoans lack of Canadian leadership

Jim Balsillie writes about the need for Canada to have a national data strategy.

This is not the first time Canada has dropped the ball on this topic. Maclean’s magazine wrote about it in September 2017, Michael Geist wrote about it in December 2017, the CBC posted about it in July 2017, and CTV also covered it in September 2017. All in all, there has been some coverage of the topic spread out over a long period of time, but I doubt Canadians realize what is going on, and are definitely not getting on the phones to their MPs and telling them to get educated and stand up for Canada.

Given the many instances of data abuse by individuals and organizations, it is quite worrying that even in the year 2018 an alarm is not being raised over this issue.

The data war behind net neutrality: CC republish

The data war behind net neutrality

File 20180111 101508 dojgri.jpg?ixlib=rb 1.1
Ajit Pai, former Verizon lawyer turned head of the U.S. Federal Communications Commission (FCC), rolled back President Barack Obama’s net neutrality policy in December.
(AP Photo/Jacquelyn Martin)

Roger Kamena, L’Université TÉLUQ ; Daniel Lemire, L’Université TÉLUQ , and Nicolas Scott, Université de Montréal

In the social and political saga surrounding the question of net neutrality, what is often overlooked is the data war going on behind the scenes. The real fuel behind the debate is the enormous volume of data we generate with each search and click.

As a marketable commodity, large-scale audience data has completely transformed the global economic landscape in less than a decade. The emergence of GAFA (Google, Amazon, Facebook and Apple) germinated a disruptive new business model that capitalizes on what many consider to be the new oil: Data.

Based on a study published by eMarketer in September 2017, we can see how user-data companies (UDC) now hold the top five positions among the largest brands in the world.

In 2006, five of the top 10 brands were retailers. By 2017, nine of the top 10 brands in the world were UDCs.

The business of user data

The nature of the data business model can be understood by the relationship between its three core pillars: The internet user, who generates the data; the content publisher, who offers the internet user a service (often free) in exchange for personal data; and the advertiser, who buys data from content publishers in order to run more effective marketing campaigns.

The schema below attempts to illustrate the nature of this internet user data paradigm:

Credit: Adviso Conseil Inc.

Clearly, the winners in the 2017 repeal are the large U.S. telecommunications companies, who happen to be the glue, as internet providers, between the internet user and the publisher (Google, Facebook). They stand to gain an enormous strategic advantage with the end of net neutrality.

By having more control over an individual’s internet usage, those companies are in a position to adjust prices in ways that could significantly benefit their bottom line. For example, AT&T could decide that from now on, given the large bandwidth used by Netflix, the latter would have to pay a usage fee to maintain its regular website streaming speed.

Conversely, the internet service provider (ISP) could just as well charge internet users an extra fee to maintain their Netflix streaming at a regular or faster speed. In an extreme case of greed, the ISP could overcharge both Netflix and its user.

But there is more to it than that.

The real reason Verizon bought AOL and Yahoo!

In 2015, Fortune purported what it deemed to be the “real reason Verizon bought AOL.” In that article, journalist Kevin Fitchard observed:

“Verizon isn’t trying to create an Internet powerhouse with this investment. It’s likely just trying to gain some type of foothold in the changing online industry, as its traditional communications business slows down.”

Fitchard is alluding to the dominance of the data business model that gave rise to GAFA. As such, we can see why telecom companies like Verizon that control the internet channels through which the data is transmitted would also want to control — and take advantage of — the data itself. As Fitchard further observes in the same article:

“While AOL may be most known for its dial-up services and growing content empire — which includes The Huffington Post, Engadget and TechCrunch — it also has put together a sophisticated suite of advertising technologies for online and traditional media that no other company (aside from Google and Facebook) can match.”

The advertising technology in question, commonly referred to as programmatic advertising, uses advanced machine learning and artificial intelligence (AI) on the data generated by online user behaviour, and tracked by browser cookies or device IDs stored in mobile applications. Much of the advertising performance offered by Google, Facebook, AOL and others is largely attributed to their investments in this kind of technology, which Verizon can now leverage.

As described in an email by John Cosley, director of marketing for Microsoft search advertising, digital ads are “perhaps by far the most lucrative application of AI [and] machine learning in the industry.”

The birth of a super entity

To maximize the power of these advertising algorithms, companies need to secure big data. Since internet users are the prime generators of this precious raw material, publishers need to continually increase the number of visitors coming to their websites or mobile applications.

In a move to secure that expansion, shortly after its acquisition of AOL, Verizon bought Yahoo!, Google’s competitor in the search engine market. Yahoo! also has access to the entire Microsoft advertising network and its user data.

In order to assess the impact of this streak of acquisitions on total user reach of Verizon vs Google and Facebook, we used comScore data from May 2017, made available courtesy of Adviso Conseil. The comScore platform is essentially an audience analytics software used to track the data coming from most of the large desktop and mobile publishers in the world.

The data pulled for this graphic shows the distribution of unique visitors across all the top platforms in the United States. The chart clearly shows Yahoo! and Microsoft competing closely with Google and Facebook in terms of user reach.

comScore data for the United States (June 2017)

The competitive advantage of this merger — now a super-entity called Oath by Zerizon — stands out immediately when one looks at the combined reach of AOL, Huffpost and Yahoo!.

The U.S. and the rest of the world

The best way to illustrate the direct relationship between data and net neutrality is to simply ask the following question:

If a telecommunication company like Verizon were in a position to compete with Google and Facebook for data dollars, what happens if it also controls the data pipeline used by its competitors?

The answer is obvious. If U.S. telecoms can capriciously control internet access, while also controlling platforms that compete with GAFA, what stops them from impeding the pipeline of their competitors? Absolutely nothing.

Back in 2014, German Chancellor Angela Merkel spoke out against net neutrality. We expect the recent decision in the U.S. to further affect the polarity of opinions on net neutrality in that region and in the rest of the world.

We should also note that Google, in an obvious preemptive response to the end of net neutrality, launched its own ISP infrastructure in 2010 called Google Fiber.

In the end, with the reigning status of the global top 10 brands on the line, the data war is undoubtedly what drives the debate over net neutrality.

Roger Kamena, Principal Consultant, Digital Media and Data Science, L’Université TÉLUQ ; Daniel Lemire, Professor, L’Université TÉLUQ , and Nicolas Scott, , Université de Montréal

This article was originally published on The Conversation. Read the original article.

Open Banking

I came across this by chance, and after reading some articles about it, I think it is a mind-blowing idea. Open Banking has the potential to give power back to the bank-customer. Very little, but still, at least the customer will be able to shop around for better deals.

I doubt it will come to Canada in the way it should, because the banks here will certainly go out of their way to squash it thoroughly. I mean, just look at what happened to Tangerine Bank, formerly ING Direct Canada.

Here’s the BBC on Open Banking. Wikipedia on the same topic.

Wired has an excellent primer on Open Banking.  There’s this interview with the man at the centre of Open Banking in the UK.


China’s Social Credit System puts its people under pressure to be model citizens (CC Republish)

China’s Social Credit System puts its people under pressure to be model citizens

File 20180122 182938 1pddizy.jpg?ixlib=rb 1.1
China has introduced the Social Credit System in 12 demonstration cities.

Meg Jing Zeng, Queensland University of Technology

In less than a month, China’s Lunar New Year will bring the country’s annual epic travel rush – the largest human migration on earth.

While many are planning trips to their home towns to attend family reunions, millions more Chinese citizens have been blacklisted by authorities, labelled as “not qualified” to book flights or high-speed train tickets.

This citizen ranking and blacklisting mechanism is a pilot scheme of China’s Social Credit System. With a mission to “raise the awareness of integrity and the level of trustworthiness of Chinese society”, the Chinese government is planning to launch the system nationwide by 2020 to rate the trustworthiness of its 1.4 billion citizens.

What ‘credit’ means in China

The word “credit” in Chinese – xinyong (信用) – is a core tenet of traditional Confucian ethics, which can be traced back to the late 4th century BC. In its original context, xinyong is a moral concept that indicates one’s honesty and trustworthiness. In the past few decades, its meaning has been extended to include financial creditworthiness.

So what does “credit” mean in the Social Credit System?

It is a question Chinese authorities have been exploring for more than 10 years. When the plan of constructing a Social Credit System was first proposed in 2007, the primary goal was to restore market order by leveraging the financial creditworthiness of businesses and individuals.

Gradually the scope of the project has infiltrated other aspects of daily life.

Actions that can now harm one’s personal credit record include not showing up to a restaurant without having cancelled the reservation, cheating in online games, leaving false product reviews, and jaywalking.

Ninan Transport Police demonstrates how facial recognition is used to identify pedestrians jaywalking.
Nanjin Transport Police’s public Weibo post

Read more:
China’s dystopian social credit system is a harbinger of the global age of the algorithm

Reaping rewards for ‘good deeds’

One shared focus of the country’s existing pilot schemes is to generate a standardised reward and punishment system based on a citizen’s credit score.

A Chinese citizen showing her ‘trustworthy card’
Henan Broadcasting’s public Weibo post

Most pilot cities have used a points system, whereby everyone starts off with a baseline of 100 points. Citizens can earn bonus points up to the value of 200 by performing “good deeds”, such as engaging in charity work or separating and recycling rubbish. In Suzhou city, for example, one can earn six points for donating blood.

Being a “good citizen” is well rewarded. In some regions, citizens with high social credit scores can enjoy free gym facilities, cheaper public transport, and shorter wait times in hospitals. Those with low scores, on the other hand, may face restrictions to their travel and public service access.

At this stage, scores are connected to a citizen’s identification card number. But the Chinese internet court has proposed an online identification system connected to social media accounts.

Read more:
Thinking of taking up WeChat? Here’s what you need to know

Naming and shaming of blacklisted citizens

Publishing the details of blacklisted citizens online is a common practice, but some cities choose to take public shaming to another level.

Several provinces have been using TV and LED screens in public spaces to expose people. In some regions authorities have remotely personalised the dial tones of blacklisted debtors so that callers will hear a message akin to: “the person you are calling is a dishonest debtor.”

Blacklisted debtor displayed on a LED screen in Taishan city.
Taishan Government via WeChat

It is important for a country to be able to enforce court orders, but when the judicial and legislative systems sometimes malfunction, as they do in China, it raises questions about whether the ability to expose and punish without due process can lead to abuses of power.

Liu Hu, a vocal journalist who has criticised government officials on social media, was accused of “spreading rumour and defamation”. While seeking legal redress in early 2017, he realised that he was blacklisted as “untrusworthy” and prohibited from purchasing plane tickets.

Liu’s story may be an isolated incident, but it demonstrates how the system could potentially be used to push the government’s agenda and to crack down on dissent.

Read more:
What we can expect from China’s economy in 2018

Harnessing the power of big data

The role of big data in the project has received broad media attention outside China due to concerns about how the Chinese government may use its power to further intensify surveillance.

For example, Chinese tech giants Alibaba and Tencent are testing user credit files based on behavioural data gathered through people’s use of social media and e-commerce sites. To date, few operational details have been released about the country’s plan to integrate user data from online platforms into a central system overseen by the government.

This will soon change. Since last December, the National Development and Reform Commission and Central Bank of China began to approve pilot plans to integrate big data with the Social Credit System. As one of China’s first pilot provinces, Guizhou province was selected to showcase a government-led experiment of a big data-empowered Social Credit System.

Guizhou is one of the poorest provinces in China, and is mostly known for being the home of Maotai – a high-quality liquor. This seemingly random choice of location is actually tactical. Unbeknown to most, since 2015 this rural backwater has been fast becoming the country’s hub of big data.

Xi Jinping visiting the big data pilot zone in Guizhou.
Xinhua News

In 2017, tech giants Google, Microsoft, Baidu, Huawei and Alibaba established research facilities and data centres in the region. In 2018, Apple is following suit and transferring its Chinese iCloud server to a local company.

Guizhou’s position as the country’s data centre makes it an ideal social laboratory for the local government’s Social Credit System experiments.

Turning the system back on the government

While some might view China’s Social Credit System as something out of dystopian fiction, if properly implemented the system can have positive impacts – especially when used to keep government officials and business owners accountable.

Most pilot schemes target companies as stringently as individuals. Firms with a history of environmental damage or product safety concerns are now regularly exposed on online blacklists.

Government officials can also be found on online blacklists. As of December 2017, more than 1,100 government officials had been blacklisted as untrustworthy. Such a move to expose corruption is arguably more beneficial to Chinese society than public shaming of jaywalkers.

The ConversationAs Professor Du Liqun of Peking University argues, in the Chinese context the construction of a Social Credit System should start with building a trustworthy government.

Meg Jing Zeng, PhD candidate, Queensland University of Technology

This article was originally published on The Conversation. Read the original article.

Data uncertainty visualized

When dealing  with data, a common assumption is that – data either proves or disproves something, straight up. There is no ambiguity. Or at least, that is what one usually assumes about data.

However, the truth is that where there is data, there is bound to be uncertainty. And visualizing uncertainty is an important part of visualizing data if one is to responsibly present data. This post does an excellent job of explaining the pros and cons of various ways of visualizing uncertainty in data.

Stamp’s Law

As much as this blog is about data, it is worth acknowledging that data, first, has to be collected. As data has become a more and more prominent topic in the media and more and more faith is put into data, the following quote reminds me of the chink in data’s armour:

“The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the chowkidar (village watchman in India), who just puts down what he damn pleases.” – From Wikipedia

Data vulnerability of job seekers

As a job seeker, I apply to plenty of jobs online.

Job applications are submitted in mainly two ways:

  1. Upload (cover letter and) resume
  2. Create account, and then upload (cover letter and) resume

The Ontario government has a simple straightforward interface of asking for some personal information and then accepting an upload of a cover letter and resume as one file.

Then, there are the websites where you have to create an account to upload your cover letter and resume. This LinkedIn post very accurately captures the frustrations associated with this system. While I do not quite agree with everything in the post, they are very valid points.

This article accurately captures the overall frustrations associated with searching for a job, including the above-mentioned job application systems. This article focuses solely on frustrations, without commenting on the job application systems. To wrap up the picture, this article talks about how the reality of job applicants is not reflected in the numbers.

But to come back to the main point. Many of the websites where I submitted my job applications, were either operated by ICIMS or Taleo, among others. ICIMS and Taleo are Applicant Tracking Systems (ATSs) – automated tools to help companies parse through thousands of job applications so they can spend less time reading letters and resumes and just hire someone to do the work. Automation gives rise to more automation – just as companies use ATSs to automate hiring, companies are popping up to automate the job application process itself, to help applicants beat the ATS.

The part that I would like to note is that an applicant may end up submitting applications to various different companies, all of which use ICIMS as their ATS. In such an instance, the usefulness of LinkedIn as a single-platform vanishes – it would instead be useful to create a profile on the ATSs like ICIMS, Taleo and others and then just apply for jobs through them!

Others, like Deloitte, have their own ATS, but it is in the US even if you are applying for a non-US based job. As per US law, any data stored anywhere in the US is freely accessible to the US government.

ATSs, as far as I know, do not store their data in Canada – as most are US-based, the data also ends up there. This, raises the issue of data sovereignty – even though I am an applicant in Canada, applying for a job in Canada, my data will end up in the US. Granted, by using Gmail I am already giving up my data to the US, but that is because I want the free email service. How does that argument apply to my job search? As a Canadian applying for a job in Canada, it is reasonable to expect that my job application and related data stays in Canada. Yet, I am forced to give up my data sovereignty just to able to apply for a job, let alone being hired! (Not that it is right for a person to have to give up their data sovereignty to be hired either).

By forcing job applicants to give up control over their own data, the job application process takes advantage of the vulnerable status of the applicants, makes them further vulnerable, and also violates their data sovereignty. The question here is not why does the job applicant continue applying, but why are non-US based companies happily giving up their own data to the US?