Creating Personas for Audience Research in Social Media

The development of personas is a common creative resource among advetisers. In IBPAD projects, we use the tactic to represent and single out some audience segments analyzed from data and research. The whitepaper Creating Personas for Public Research in Social Media shares some of our experience. Personas increase legibility, engagement and rapport between discoveries and reader. Check and download the whitepaper:


Criando Personas para Relatórios de Pesquisa em Mídias Sociais

O guia inédito do IBPAD demonstra o processo de construção de ilustrações de personas em relatórios sobre públicos em mídias sociais, a partir de experiências reais. Redigido pelo ilustrador Yuri Amaral, parceiro frequente de nossos projetos, e Tarcízio Silva, Diretor de Pesquisa em Com. Digital do instituto. Acesse:

Textual Analysis – Deputies’ speeches during the impeachment voting session

Journalist and political science teacher, Fábio Vasconcelos, published a quick analysis on the blog Na Base dos Dados which we created on the parliamentarians’ speeches during the impeachment voting session in the Chamber of Deputies on 17th April.

For the curious, the database was the Chamber of Deputies’ PDF (yes, believe it or not). I tried to extract data from the speeches via R using the bRasilLegis package but I discovered that it will only be released between today (19th) and tomorrow. (Update: the speeches are now available).

After handling the PDF, I used Iramuteq. For those who do not know about it, Iramuteq is a “visual interface” to use R and produce textual analysis. It is the “offspring” of well-known French textual analysis software called Alceste. However, its techniques are strongly based on the textual analysis of French literature.

One of Iramuteq’s most interesting techniques is the Reinert method of collecting together word proximity. It allows you to find large thematic groups in the body of a text in a very simple way.

In this example, we found four large groups:



Report: Brazilian Plus Size Fashion and Beauty on Facebook

Who are the main bloggers and content produzer on plus size fashion and beauty on Facebook? The study applies a methodology developed by IBPAD about the references universe for a specific segment, with the aim of understanding multi-dimensionally who leads content production on the platform.

Analysing the Public Budget Course

Understanding the processes of formulating and executing the federal public budget is fundamental for government relations professionals. Firstly, knowledge of the topic is paramount to understand the country’s current political and economic setting. The federal budget is constantly on display in the main media outlets. As an example, recent news regarding the 2015 public deficit; the consequent increase in public debt; the need to recreate the Provisional Contribution on Financial Transactions (CPMF); the urgency to reform the welfare system and the financial challenges of Petrobras can be cited. These events can have an impact on a wide range of public or private institutions to a lower or greater degree.

The second reason why knowledge of the public budget is essential for a government relations professional is due to the fact that it is the main tool that the federal government has to govern. By understanding the budget, a public relations representative is able to understand the federal government’s priorities, its revenue prospects and investment proposals.

Professionals who are willing to take part in preparing or improving public policies need to know about and follow the dispatch and approval of the Multi-Annual Plan (PPA), Budget Guidelines (LDO) and Annual Budget Law (LOA) by National Congress. They should also be able to understand additional credit facilities and identify enforcements of parliamentary amendments, which recently reached newspaper headlines with the so-called Proposal to Amend the Constitution (PEC) for the imposed budget and which authorised R$ 9.0 billion for individual amendments in 2016.

Lastly, acquiring knowledge on the public budget can give a government relations professional the capacity to illustrate phenomena or anticipate trends. This is the third reason for him to have knowledge of the budget. The downturn of the Brazilian oil and gas sector, for example, may have been represented if the drop in Petrobras investments in recent years had been observed.

Orçamento Petrobras

As can be seen in the graph above, Petrobras had been decreasing its investments since 2013, following a period of expressive increase in expenditure. It should be noted that the values shown were not corrected for inflation during the period. If it had been, the decrease in investments after 2013 may be even more accentuated.

Therefore, understanding the premises of the federal public budget and having the capacity to carry out analyses of the information made available by the federal government is of utmost importance for the performance of professionals who intend to intermediate the relationship between the private and public sectors. This is highly relevant information but has been forgotten by the main courses in the area, whether at graduate or post-graduate level. In order to fill this gap, the Brazilian Institute of Research and Data Analysis (Instituto Brasileiro de Pesquisa e Análise de Dados – IBPAD) is offering a Federal Public Budget course during March. Click here for further information.

Would you like to learn how to carry out searches like those performed in this text?

Simply visualising Petrobrás investment information, as presented in the text above requires basic knowledge of Siga Brasil, whether theoretical or practical. The IBPAD Federal Public Budget course provides the knowledge needed to give participants autonomy when performing analyses and to correctly interpret the information obtained.


Within the domain of theory, for example, one needs to be aware that income from public companies is not available on the system. Only information regarding its investments is made available.

Therefore, first of all, one needs to enter Siga Brasil ( in order to gain access to it. Then, click on the “Web Intelligence” option.

Siga Brasil

The third step is selecting the “New” option, followed by selecting “Universe”. Various alternatives are then opened. Select the “executing state investments expenditure” universe for the year desired.

Siga Brasil

The last step is personalising the search according to the parameters required. It is fundamental that the institutions that you want to search, the period to be researched and which filters should be considered, among others, are clear at this time.


Would you like to find out more? Enrol on the course!


The underutilized potential of social media monitoring

Monitoring social media, as offered for around 10 years by digital agencies, consultants and the like, is based on some important premises of digital communication. First, we have the notion of potential influence of online content. The scheme proposed by Philip Sheldrake below is eloquent simplicity:

philip sheldrake - influence

The six flows above address respectively: the influence that brands try to impress on consumers (through content, announcements, special campaigns); the conversation between them and consumers / citizens (word of mouth marketing, fandoms, technical support etc.); the influence of consumers to brands (complaints, demonstrations, customer service); and the three flows regarding competitors, that are also acting and fighting in the competitive environment. It is from these flows that monitoring social media gains strength, be it studying word-of-mouth or studying corporate communications. But there is a under-utilized potential in these data that suppliers do not use.


With the introduction and popularization of full monitoring tools (those that allow collection, storage, sentiment analysis, classification, reporting) from 2006-2007 in North America and Europe (like Sysomos, Brandwatch, Radian6) and 2009 / 2010 in Brazil (like Scup, Seekr, Brandcare), monitoring of social media was consolidated as a service and practice in advertising agencies and the like.  Practices were developed in relatively alien form patterns, theories, knowledge and market research praxis.

Partly because of this, most of the more general tools – except for specialized tools or the most market relevant – one can see relative stagnation of the possibilities offered. There are three gaps that are particularly relevant in this regard due to the intelligence potential, which become particularly serious when considering how much the volume of data available today is vastly superior to the volume pioneers of some methodologies had access to.

Focus on Issuers / User Monitoring

The data in most of the tools are based on so called indications or occurrences. It is the content unit count (tweets, posts, comments, videos, and photos) collected, the basis for comparisons and measurements regarding the volume, sentiment, topics, answers etc. But there is a large discrepancy, because a user minority publishes most of the instances, this is often much more intense than a Pareto distribution.

Occurrences x Users. Thus, one of the industry standards, the reporting of data according to the number of occurrences / mentions, is questionable in most of the deliveries made. Ability to deliver data from the quantification of users, as well as compare the variation and concentration of engagement, brings greater reliability to conclusions.

Sample designs. The sample data in the social media monitoring tools are mostly based on events rather than the users themselves. There is still much methodological ground to cover towards reliability and representability of samples taken from the type collections, but depending on the goal, taking users as the basis (instead of the entries) decreases imprecise data.

Studies focused on users. Search parameters that collect all user mentions in media such as Twitter, Instagram, or Facebook Groups or Blogs open the range of information products. Most inventive professionals use these resources to develop Public Studies, Ideation Personas, segments of Construction and the like. However, options for tagging, filtering and extraction of data focused on users are still relatively uncommon in the same market leaders. Something as simple as allowing user coding and reporting results from these codes greatly opens the delivery range (and, by extension, monetization).


 Techniques of Corpus Linguistics and Text Analytics

The text analytics techniques in tools are generally still very archaic. In absolute majority, the word frequency count was based only on 1-gram, stop words are not editable, there is no lammentization and other resources involving association between words do not exist. Some seek to develop natural language processing in search of the ill-fated automatic sentiment analysis, leaving aside the rest of the text wealth.

British National Corpus development team

Reviewing a simple definition of linguistic corpus by Tom McEnery, the potential relationship between areas is transparent, “Corpus Linguistics is the study of language on a large scale analysis supported by computer extensive collections of written words or texts.”

When you look at the history of corpus linguistics, for example, the data volume comparison available is monumental. One of its major milestones was the creation of the Brown Corpus course in the 1960s. Researchers at Brown University have compiled over time, 500 texts representative of American English, with 2000 words each, totaling 1 million words.

Compared to any project of medium / large monitoring, the gross volume of data is easily overcome. Even the HERMES, 7 million tweet corpus developed by Michelle Zappavigna for his 2012 book, seems small compared the size of joint projects of large companies that carry out social media monitoring

jornalistas x advogados - frequencia
Frequência de termos por clusters de público

Frequency terms by public clusters. Strictly speaking, there is no representation and strict construction of corpora in these cases, but why are such volumes of textual data  not exploited with inherited techniques of corpus linguistics? It is not as if the technologies were out of reach of developers, since advanced software such as Voyant Tools  and Antconc are available in a free and open form.

Antconc, concordancer free version, developed by Laurence Anthony


 Social Network Analysis

Perhaps the most shocking thing in a market that treats its employees as “social networking analysts” is the amazingly rare use of social network analysis concepts. Terms such as nodes, links, graphs and clusters are rare in the lexicon of an area that still ignores decades and decades of thinking about the dynamics of networks.

The two images below exemplify the scale issue when we talk of networks in social media. The first is an adaptation of the study performed by Jacob Moreno in a female school, analyzing sociability dynamics of students in the 1960s. The 26 students and 52 connections represent the inequality of affection and social care:

dining table partners - jacob moreno

The next is the visualization of the small study data that we did on the mobilization #PrimeiroAssédio  (#FirstHarassment). Tens of thousands of tweets collected in a matter of hours, resulting in a navigable network from which emerged patterns of interaction, influence, conflict and collaboration in the network. An almost unimaginable dream for the pioneers of ARS decades ago:

análise de redes - primeiro assedio

Strictly speaking, the connections between profiles or hashtags are already there, implied in any monitoring carried out in published tweets and image patterns. But while technology only looks at data collection, leaving aside the proactive innovation, these connections are not taken advantage of – do not turn  into intellectual or financial value. As seen in the other post about Ferramentas de Análise de Redes para Mídias Sociais, (Network Analysis Tools for Social Media), commercial tools with features of networks are still rare, hampering the work of professionals. And it’s not because of a lack of opportunity and demand: the market is eager for a hand, and educators make intensive efforts to transform alfabetização em redes (literacy networks) into something universal.

There is still a long way to go for us; communication professionals seize the full potential of data and digital methods. In  Social Media Monitoring courses, we try to talk, explore and discuss some of these possibilities. And you, what do you think? Is it possible and smarter to move on to the digital intelligence market?

Learning R is an investment for your Data Science career

Article written by Guilherme Jardim Duarte, teacher on the “ “R Programming” course.

Data science is omnipresent today. When you buy something on the internet or access a newspaper site, data is collected which will be analysed to better understand your behaviour. Data is cross-checked to understand your behaviour and that of organisations and governments. Multiple information is compiled in order to understand the probability of an event taking place, a football team winning a match or even which film/series you are going to like (the rpca package is used by Netflix, for example). Understanding how this is done is imperative in today’s labour market and learning R means making a leap towards learning data science and is one of the best investments of your career.

R is a programming language developed to deal with statistics and data analysis, which emerged in 1993 from another language, called S. It is one of the most commonly used today, both scientifically and analytically, and is one of the fastest growing programming languages. As this graph shows, R has the highest growth rate among users of Stack Overflow, an internet forum that focuses on programming.

subway - style rank year tag plot

One of the advantages of R is that it was created to deal with data. After the syntax is mastered, it is easy to join data, clean columns, resolve textual problems, and various other actions which cause problems for every analyst.  The language’s functional structure allows actions to be executed which would be difficult in other programming languages such as Python or C++.  In addition, unlike common tools such as SAS, SPSS and Stata, R has open-source code and is free to use.

an introduction to statistical learningAnother important advantage of R is that there are numerous free packages with statistical models ready for use. As the community which develops packages for R is very large and active, it is very likely that a recently-developed model will quickly appear among the packages maintained. Or, in other words, that extremely innovative article that you read where the authors developed some new techniques, they will probably be available in R, free of charge. If you use R for econometrics, for example, there are a series of packages which facilitate causal analyses. A further example is machine learning: various algorithms are implemented in R packages, making it really easy to use, and also important authors in the area, such as James, Whitten, Hastie and Tibshrani have a book which focuses on the language.

R supplies solutions for a wide range of fields. Even for data visualisation, you can count on packages which implement dynamics visualisation in javascript, such as rdimple and leaflet. There is ggplot2 for static visualisation, created by Hadley Wickham, which allows beautiful graphics to be created very simply, like the ones below.

grafico R - ggplot2

Also, mentioning Hadley Wickham, is telling you a little more about the history of R. It is impossible to use the language today without using some of the packages created by him. He designed packages to make it easier to operate with data (dplyr), clean data (tidyr), for visualisation (ggplot2 – mentioned above) and automate data collection on the internet (rvest), among many others. There is good reason that his group of packages are called “Hadleyverse”.

In my opinion, R’s greatest advantage is that it is quite easy to learn. You do not need to go beyond basic programming to attain powerful results. This means that any Excel user can easily carry out operations in R. A language such as Python, although it is also efficient, requires a much steeper learning curve, particularly because it is difficult to escape from concepts such as object-orientated programming.


E por onde começar?

There are various free R courses and my colleague and teacher, Carlos Cinelli, is also writing a book which is accessible and free to learn R on his blog. However, we know that the first step is often the most complicated one – there is often a lack of discipline and motivation to continue on your own. We will personally help you take the initial steps here.

We offer an R programming course at IPBAD. I am responsible for the São Paulo course and  Carlos Cinelli for the Brasília course. We will teach you to program in R or, in other words, we would like to be the initiators in this long and enjoyable data analysis learning process. Included in IBPAD’s values, we want to make you independent for your next steps.

To summarise, R is the best way to place yourself in data science in one go. You can easily execute various statistical functions with it, meaning maximum benefit with a low learning cost.  If someone is interested in making full use of data, there is no doubt: learn R.

Ethnography to Understand Culture in the Information Age

Article written by Professor Débora Zanini, responsible for the course Etnography in Social Media

There is no denying it: we live in a new era.

Some just call it ‘post-industrial or modern society’. There are also those who call it ‘ the age of knowledge’.

What is known, however, is that since the 80s mankind has entered an era of radical changes never seen before: if in the past centuries radical change was agriculture, then industry, today we are experiencing the beginning of the Communication / Information Age.

Born with it is a new way of living, human activity, human understanding, behavior, new forms of employment, income, and especially a new form of power.

“What dominates knowledge dominates the world,” Comblin 2006.

Physical distances are no longer limiting, design and perception of time changed completely (and how!), cultures mix faster and forms of human communication emerge and transform at a frightening speed.

The Information Age is the new historical moment in which the bases of all relationships are established through information, its processing capacity and generation of knowledge.  This phenomenon was called “network society” by Castells (1999) which is mainly based on the appropriation of the Internet with its uses and features incorporated by the capitalist system.

For example, if before communication was verbal, non-verbal and gestures use, now we use mediated communication (the communication process in which some sort of technical apparatus is involved which intermediates speakers). Now, if the components of the communication process are given by: the message sender, the receiver, the message itself, the propagation channel, the media, the response and the environment in which the communication process takes place; any changes in these variables have much impact in the communication process as a whole.

“The possibility of participation and exclusion of the digital universe, integrating the processing of data and generation of knowledge, or even standing on the sidelines of this dynamic affects, above all, the human relationship in which communication is active, passing the anthropological, social and even philosophical aspects. They are languages, uses, sensory perceptions, new identities formed and symbolic exchanges that are entangled in a network that does not rule out even the economic aspect within these new relationships. From the economy point of view, the network has brought profound changes to society, redefining the categorizations of International Labor Division (ILD) between countries and economies. “(SIMÕES, 2009)

Along the same lines of the network, this congregation forms a new culture that Lévy calls cyberspace culture, or “cyberculture”:

pierre lévyCyberspace (I will also call it the “network”) is the new means of communication that arises from the global interconnection of computers. The term specifies not only the physical infrastructure of digital communication, but also the oceanic universe of information it holds, as well as humans who sail and feed this universe. As for the neologism “cyberculture” here it specifies the set of techniques (material and intellectual), practices, attitudes, thought modes and values that develop along with the growth of cyberspace”. (LÉVY, 1999, p. 17).

And so it is no wonder that these issues are of great interest to scholars today, whether in the areas of social sciences, communication, journalism, psychology, marketing, among many others.


Tools to analyse Networks and Graphs in Social Media

voson uberlink

Analysing social media networks is one of the major trends on the increase in the area of online data research and analysis. Although techniques of this type are used by professionals and researchers, there is still a lot to be published and embraced by people and organisations. The Social Media Network Analysis course and our services and consultancy work in the area, seek to assist the advance of SNA knowledge in Brazil.

We have listed some software below which is useful for analysing social media networks that does not require any programming knowledge by users. If you are not aware of network analysis, start by taking a look at the Starting with Networks document translated by IBPAD.


What is Ethnographic Research?

Ethnography is a methodology that has been gaining ground in the corporate world, particularly within agencies and companies working with social media. Previously it was only known within the university world of humanities; today it often stands out as a differential method in analyzing and reporting.

In fact, the qualitative analysis, as a whole, has gained a high profile in recent times. Qualitative research is not now seen and described as what ‘is not quantitative’ – but as a field to be explored and studied, because, just as quantitative research, several approaches and different techniques can be applied, depending on the researcher´s interest  and objective.

In general, qualitative research has as the main point to understand, describe and sometimes to explain, social and cultural phenomena of social groups and / or individuals.

One of the main qualitative researches, as already mentioned above, is the Ethnographic Research. Both in qualitative research history and in its recent interest, ethnography played a key role – “(…) much of what is known about the field relationships, about openness and direction towards a field and its members, we know through ethnographic research.”

(ANGROSINO, 2009).


