Interview with Alberto Cairo
The Corona Virus being discussed across the world, with all the visualizations there, it is timely that I am speaking with Alberto Cairo. He is a visualization guru and has written the book “How Charts Lie”. This book helps you as a statistician to understand various problems about charts, but more importantly, it helps you to educate your business partners that are not statisticians.
We also discuss the following points:
- How tables compare to charts in terms of “lying”
- How to strike the best balance between showing the details and overwhelming the audience
- What behaviours or habits should we develop to improve our visualizations
About Alberto Cairo
Alberto Cairo is a journalist and designer, and the Knight Chair in Visual Journalism at the School of Communication of the University of Miami (UM). He is also the director of the visualization program at UM’s Center for Computational Science. He has been head of information graphics at media publications in Spain and Brazil.
The author of several textbooks, Cairo currently consults with companies and institutions like Google and the Congressional Budget Office, and has provided visualization training to the European Union, Eurostat, the Centers for Disease Control and Prevention, the Army National Guard, and many others. He lives in Miami, Florida.
Read more about Alberto Cairo
Read a long-form profile at Microsoft News
Listen to this episode and share this with your colleagues!
Alexander: You are listening to the Effective Statistician Podcast. A weekly podcast with Alexander Schacht and Benjamin Piske, designed to help you reach your potential, lead great signs and serve patients without becoming overwhelmed by work. Today, we will talk about covid. And how charts lie, an interview with Alberto Cairo, and now the music.
Now, originally I wanted to publish this episode much further down the line paired with all the discussions about coronavirus praying across a world, with all the visualization there and some of these being highly cited and a lot of used just thinking about John Hopkins, dashboard or the animations in the Washington Post that’s so much to be said about it. And so it’s really, really timely that I’m speaking with Alberto Cairo who is an excellent visualization Guru. He’s actually a professor in Florida for visualization and has a passion for this and has written several books. His last book is called “How Charts Lie”. And so really, really helpful for every statistician to understand. And also you will learn about how this book can help you, educate your business partners that are non statisticians. So stay tuned for an awesome episode with Alberto Cairo. If you haven’t done yet, join the LinkedIn group because there I usually post about kind of who is my next interview guests and you can sometimes have see those post and inspire questions, so two of my friends actually helped to design the questions for this episode, and maybe your ones that helps with another episode in the future. So join my LinkedIn groups, the effective statistician and follow me on LinkedIn.
This podcast is produced in association with PSI Community dedicated to leading, and promoting the use of Statistics within the healthcare industry for the benefit of patients. Join PSI today to further develop your statistical capabilities, with access to the video on demand Content Library, free registration to all PSI webinars and much much more. The reduced rate is only £20 for non high-income countries, and £95 pounds for high-income countries and there’s so much coming in terms of virtual content. So if you’re working from home, and you can’t travel, this is the right place for you. Visit the PSI website at psiweb.org to learn more about PSI activities and become a PSI member today.
Welcome to another episode of the Effective Statistician and today, I’m quite excited because I have an interview guest from the U.S. from Florida, Alberto Cairo. Hi! Alberto, how are you doing?
Alberto: Doing good, how about yourself? Thank you for having me by the way, this is exciting.
Alexander: Yeah, it’s really exciting, and I think the topics that we are talking about today are more important than ever. With all the discussions about Corona going on with this lots of visual communication about Corona.
Alberto: Yes indeed.
Alexander: Lots of case studies about data visualization with this topic and so it comes really really timely.
Alberto: It’s very timely indeed, yes.
Alexander: Although my initial request for you wasn’t driven by this now pretty widespread disease. It was much more by the books you’re writing and that’s really kind of the background to that. But before we dive into the books, maybe tell a little bit to the listener about what created your interest in Data Visualization in the first place.
Alberto: Sure, I can do that. Well, long story short, I am a professor. I am a professor of visualization and explanation and information Graphics at the University of Miami. I am a journalist by training. So I have a degree in journalism. I worked in the news media for quite a long time. I was the head of Graphics in several media organizations in Spain and in Brazil later, and then I moved to the United States to teach how to create visualizations, but the type of Journalism that I used to do was not written journalism. It’s graphic journalism, so what I did was to tell a story by conveying formation through Maps, through Graphs, through Visual explanations, illustrations, things like that.
So my entire career has been devoted to explaining things visually, and I’ve always been interested in this type of communication. I began my career doing mostly illustration based explanations. Imagine that they’re not and NASA sent a new Mission to Mars or whatever. So I did an infographic explaining how the spaceship was. A cutaway of this spaceship showing the different parts in a 3D model. So I began my career doing that. But then around 2009 or something like that. I started getting more and more in the visual display of data, namely data visualization. So how to transform quantities into Graphics that would lead people readers to extract or spot insights into patterns and trends in large amounts of data.
So I started seriously studying the field. There’s plenty of research in the field, coming mostly from statistics, from Computer Science and also a little bit from Journalism. So I read a lot about that. I read a lot about Cartography and I started writing about it myself, just to clear up my ideas and translate basically what I was reading for a more general audience. Yeah, that book. I’m going to stop talking now.
Alexander: That’s really great because it gives directly the background for the first three books You’ve written. The Nerd Journalism, where there’s a title really tells about where you’re coming from.
Alberto: Yeah, a Nerd Journalism is actually not a book, It was never published as a book. Nerd Journalism is essentially the draft of my PhD dissertation, which I made available for free online. It’s basically a study about how Graphics, how news Graphics have changed in the past 20 years or so. And I described this transition that happened to me personally, but also happened at a societal level or a professional level in many media Organizations. 20 years ago, newspapers and magazines used Graphics tasks. Used to produce mostly illustration based visual explanations. But in the past 10 years, they switched their attention towards data visualizations. And I try to describe why that happened? Why did that transition happen? And how Graphics desk changed.
So it’s not an actual book. It’s a draft of a dissertation. And the actual books are Functional Art, The Truthful Art, and then the later one is called Hor Charts Lie which is my first book for a general readership, the general public.
Alexander: In terms of the How Charts Lies, that’s actually quite a provocative title.
Alberto: It is, and I chose the title on purpose.
Alexander: Yeah, I guess so. What inspired you to write that book?
Alberto: Well, it’s obvious that in the past, I would say, five, ten years we are experiencing a deluge by adding information, right? The internet has opened the door to both good information and bad information. Good information in the sense that anybody who is interested in communicating to others can do it. Everybody has sort of like the tools that used to be owned by newspaper owners. The platforms to publish used to be owned by newspapers and magazine owners, but nowadays, everybody can become a journalist. Everybody can decide, well, I have this data set or I have this information, and I’m going to pull it out. And that is great, that’s like a sort of democratization of information spread, but it also has some bad consequences. Because sometimes people get things wrong and they still put things out, or there are bad actors who could put bad information out on purpose. So, I started getting worried about these two phenomena, either Bad Actors means using Graphics to misinform the public, or Good Actors or so people who had the best of intentions who end up misleading the public anyway, because they make a graphic that is wrong or they don’t understand the data that they are visualizing or because they read too much into the graphic that they are designing. So what I’m explaining is, that whenever we see a misleading graphic, it is much more likely that that graphic was designed with the best of intentions, but the designer still got things wrong just because they didn’t understand the data correctly. Then, it is a bad actor trying to mislead us, although I cover both phenomena. Bad actors and good actors, who should know better so to speak.
So I started getting worried because I saw what happened in the run up to the 2016 Presidential election in the U.S. Trump is using several maps and several graphs. So cases like Sharpie gate, for example, Trump manipulating a National Hurricane Center. A map that came later, it came after the book was published, but it also explains sort of like the kind of phenomena that I was identifying out there, that justified the need for a book like these. The title, just like you said, is very provocative, it’s not just that, but the book is actually not essentially about a child’s life. It’s a manual on how to better read Graphics, how to become a better reader of data visualization. Because one of the problems that I also identified throughout the years, is that people, and when I say people, I mean, average, people like myself. I don’t have any expertise on anything other than design and journalism. We tend to take data visualizations at face value. We take a look at the visualization and we assume that the visualization is right. We take a quick look at it and we don’t pay a lot of attention to it and we still assume that we understand it. And what I explained in the book is wrong. We should stop assuming that a data visualization is an illustration that can be understood in a quick plan and we should approach data visualizations as if they were visual arguments that require as much attention as a written argument, if not more. So I try to equip the readers with the tools that they need to approach maps and graphs and charts a little bit more critically and a little bit more informed in a way.
Alexander: So our listeners are mostly Statisticians so they could really use this book well, to inform their colleagues about how they should read the graphs that are coming from the statistics departments, and coming from medical affairs. But also, what is coming was just published in journals, medical journals. For what you see as conferences, all these kinds of different things, because there’s a lot of visualizations out there, and yet people talking to it may point out certain parts of it. But it might not actually be quite obvious to see from the visualization. All the visualization might actually hide something that is in there. Like, and I’ve seen lots of presentations lately, were you would see things like line charts that shows the efficacy of a treatment long term and you would see all these line charts look very
straight long term and so the presenter says, here you can see that the treatment effect is maintained over a long period of time and then you see in the footnote. Oh, but they have a nominator that goes down over time in terms of the patients who are treated, and you can see that they basically just show the efficacy of those patients that are still on treatment.
Alberto: Yeah, the treatment. And if you don’t count the ones that stopped the game in the treatment, that’s a great example of a chart that can be greatly misleading. Yeah. So I would say that. All right, so “How Charts Lie” is a book for the general public. It’s not a book for Statisticians so actually, I gave it to read to several friends of mine who have PhDs in statistics and they still had fun with the book and they laughed at some of the examples, but it may be a little bit too Elementary for a Statistician, I don’t know if that I may be wrong with that, I don’t know. But it could be useful for a statistician or data scientist to get inspiration on how to teach these skills to other people.
Alberto: It’s not that you will learn a lot from the book itself. Perhaps you will learn a little bit about the techniques that we could use to translate all this very technical knowledge that you all have into a language that someone like myself can understand.
I have always seen my work as if I were a translator, so someone who is a conduit who is between experts and the general public. So I talked to experts. I try to ask the expert to give me the elevator speech or what it is that you are doing. And then what I try to do is to put that knowledge in common words or in common graphics, and graphics that people can understand. So that just may be useful for that. At the same time, there are other readings or other books that statisticians I think can take advantage of to learn about the power of data visualization. So, and to learn how to become better visualization designers. There is, for example, show me the numbers, which is a great introduction to data visualization. There are call news bombers, storytelling with data, that’s very elementary but still fun to read introduction to data visualization.
Alexander: Yeah, excellent book.
Alberto: Yeah, it’s an excellent book, very basic but it gives you sort of like elementary knowledge. And my previous book, also “The Truthful Art”. It’s an introduction to Data Visualization design for people who want to take the first steps into this field. Because it’s truly a link, data visualization has a dark side, which is what I talked about in our chat slide, but at the same time, it also has a light, a bright side, which can be powerful, it can be useful, It can be persuasive and informative.
And one thing that I say in both books, in the latest one “How Charts Lie”. And the previous one, “The Truthful Art” is that I’m a great believer in the democratization of data visualization. In the fact that visualization is a little bit like writing. If we all know or we all learn how to write correctly, to express our opinions or our takes or our results. Whenever we do research, it can take us a long way to also learn how to visualize those results. Because when we combine visuals and words together, understanding can greatly increase on the part of the reader. So it’s great to learn how to write and to talk. But it is also great to learn a little bit about visual design or visualization design in particular.
Alexander: Yeah. Absolutely. When I talk to other statisticians, some really shy away from creating visualizations because maybe as a community, we are pretty detail oriented. And we want to see the second decimal behind numbers and things like that. And that’s why I think that, one of the reasons why we usually provide our results in table format. I recently heard someone saying “Yeah, the charter is always kind of biased”. I was a runner and had these tables. They are unbiased. Do you think that tables are less prone to bias compared to charts?
Alberto: Okay. So let me begin by saying again, I’m not a Statistician, I befriend tons of Statisticians and, I give my books to reader Statisticians before they are published and I still make mistakes anyway. So I cannot talk about that particular instance with any sort of expertise. My intuition though, is that it is wrong to say that the table is unbiased and the visualization is biased. That’s because our table is essentially another way to visually represent the information. It’s just another kind of visualization, some other kind of chart, the table is a chart. The same way that a line chart is also a chart, they are just two different ways to represent quantities, two ways to symbolize quantities. The numbers that you write on the table are symbols that represent quantities. The line chart that you use to represent a time series is another symbol that you also used to represent quantities. So neither of them is more or less biased. It all depends on how you design them, right? And how it is sort of like the techniques, how rigorous you are when you’re designing it.
That said, I understand where the reluctance comes from. It’s like tables have some power to them, tables with all the numbers in there, give you all the data, all the data that you’re working with and a line chart or bar chart or pie chart, or any kind of visualization, is an abstraction that you couldn’t top of all those numbers. So I understand that, that reluctance of your visualization may be a too simplistic representation of all your numbers, but you need to think about it this way. Again, a table is a visual representation, a line chart, is another visual representation. Each one of them can be used differently for different purposes.
One of the things that I try to explain in my books is that visualization is always purpose-driven and the purpose that you have in mind whenever representing data, should guide your decisions as to how to visualize those data. So a table is an appropriate way to show data when the purpose that you have in mind is to let people see each individual value on the data set. If that is your purpose, a table is a perfect way to represent your data. But if you want people to see the patterns and the trends behind the data, the table running zone, the numerical table in the zone is useless or at least is extremely limited. You need to visually encode the data, you need to transform the data, you make the data more physical, right, so to speak by mapping the quantities onto graphic forms to represent them as Maps as Graphs or Charts of different kinds. So people can spot those trends and patterns in the data. It is just a complementary method of representation. Now, one thing that I would like to say, I feel like I’m speaking too much.
Alexander: That’s alright.
Alberto: But I think that’s why I’m passionate about this stuff. There is another reason why I think Statisticians, not all, not all the Statisticians, because some of the best literature about this realization comes from the world of statistics and thinking about William Cleveland, and you know, John Tuckey, these are all landmarks. They all wrote landmark books in the history of data visualization. The Exploratory Data Analysis, William Cleveland, The Elements of Graphing Data, Edward Tufte, The Visual Display of Quantitative Information. These are all landmark books in the history of data visualizations. And they were all written by Statisticians. So it’s not all the statisticians, but the fact that so many Statisticians, feel a little bit reluctant about data visualization has to do with the history of Statistics. So, Michael Friendly, who is also a Statistician based in Canada, has written extensively about the history of data visualization. And he says that in the 19th century, there was a golden age of data visualization. Florence Nightingale, Dr. John Snow, and many other figures some considered landmarks in the History of Statistics, and Epidemiology and Mapping use graphics extensively to represent data and Michael Friendly calls these, the Golden Age of data visualization.
The challenge is that according to Friendly at the time between the 19th and the 20th century. There is what he calls I think that I’m not using the terminology wrong, but I think that he calls it, the mathematical term in statistics. Statistics became much more mathematical, so to speak and some Statisticians started thinking that visualisations were just illustrations. They were just add ons, you know, beautifying the data etcetera, this is not serious they used to say, let’s just focus on the numbers. Let’s just focus on the mathematical underpinnings of whatever it is that we are doing. And Friendly says that this is the map. These mathematical terms made visualization look less important in the eyes of the Statisticians.
And then we may have in the history of the relationship between statistics and data visualization, at the beginning the 70s, the 1970s, the 1980s. And we have these new brands of Statisticians, who started becoming interested again in data visualization, again, we have John Tuckey, we have William Cleveland, we have Tufte, and so on and so forth, who started writing about visualization, from the point of view of statistics. And I would say this is just my personal opinion, that nowadays, we are in a new age, in a new golden age of data visualization, mainly not because of Statisticians perhaps, but mostly because of journalists. Because journalists, particularly elite media organizations such as the Financial Times, the Wall Street Journal, the New York Times, The Washington Post, the Berliner Morgenpost, and many other News Publications have embraced statistics. Statistical representations of data visual representations of statistical data, and it has been proven that the public really likes data visualizations.
The other day actually, I wrote a little bit of a factoid but it has just been announced a couple of days ago. That is the most read piece ever published by The Washington Post Online is a data visualization, is a simulation of coronavirus spread in a population. And it is the most viewed piece ever in the Washington Post Online. And we have many factors such as this one. One of the most popular, several of the most popular stories ever published by the New York Times are data visualization. So there is a reason why we should all try to embrace data visualization, a little bit more. It is a powerful means to extract meaning from data, but it is also a powerful means to communicate those data to the public. And one of the functions of statistics I think it’s not just to analyze that data, it is to also communicate to other people,
whatever that you have, whatever it is that you have found. And graphics are very powerful doing that.
Alexander: I completely agree. I will put the link to all these papers and books that you mentioned into the show notes.
Alberto: I can send you the links to Michael Friendly I’ve been telling, which he talks about the Golden Age of Visualization. I have all these links. So yeah, let’s share them with your listeners.
Alexander: Yeah, and I absolutely love seeing the Washington Post article about the spread of Corona and how different social distance measures can help to reduce the spread. It’s a really nice animation that is easy to understand for people and that combines these flowing thoughts that represent the people that, you know, get connected to each other with these stacked area Charts that are animated over time. It shows you, how many patients are there? How many, or how many people are there? How many turn into patients, how many recover, how many died? And it’s a really nice visual, that gets cited across the world and it’s recommended quite widely and completely understands that this is one of the most accessed in the world.
Alberto: Yeah, the most accessed, and it’s being translated to many languages. It literally went viral, although that is not a very appropriate way to describe it, but it went viral online. We became so powerful, so popular.
Alexander: It wasn’t the main news on the TV here in Germany.
ALexander: Yeah. So it tells a lot.
Alberto: Yeah, it tells a lot.
Alexander: There is one interesting thing, is that piece also has this animation in it. And I see more of these non-static graphs appearing, kind of, you know, this is nice. Slowing bubble charts and flowing data, where you see what Americans do over the day and there’s also some really nice visualizations in the medical field colleagues of mine. And my former team has prepared something that shows how patients evolved over time in terms of the symptoms and you can see the individual patients, who gets better or worse? So one of the questions I had was, when are these features of Animation really helpful? And when would it become more of a distraction?
Alberto: Well, the short answer to that is to go back to the one of the answers that I gave you for a minute. When I said that visualization is purpose-driven, right? Whatever features you include in the visualization, regardless of the color or animation or interaction, they should all have a purpose. So if you think that your visualization can benefit from animation because it will become clearer, then by all means use animation. If you think that, you need interaction by all means adding interaction, but don’t add animation, just for the sake of making things move, because that will be a distraction, right? The power of the Washington Post piece is that animation makes a lot of sense, because if we are talking about contagion, we are talking about people moving around and giving the disease to other people. So you need to show those people moving. So it makes sense to use animation in that case, and the animation on the Stacked, a Time Series area chart. It also makes sense because it reveals the increase of cases over time little by little. So the animation that masking the graphic makes a lot of sense, but that doesn’t mean that any graphic that has animation will become super powerful, super popular and it also doesn’t mean that we always need to use animation to make a graphic popular.
I think that the most iconic image in 2020 is going to be the flattened curve graphic that we have all seen, right. It’s the most iconic I think. I predict it is going to be the most iconic image in 2020 and probably one of the most iconic data. It’s not a data visualization, it’s an abstract representation, right? It’s not based on actual data, but it’s still a visual explanation, a graphical explanation. Well, I predict that it is going to become one of the most iconic graphic, explanation graphics in history, and it’s not animated. It’s as simple as two curves. These are two curves, completely static and that’s it. Alright, you don’t need anything else, do you need animation in that case? No, you just need curves. Animations in that case will be a little bit of a distraction. It doesn’t add anything to the understanding or to the appeal of the graphic.
So again, going back to my answer, whenever we need to think about what to include or what not to include in a visualization, we should always go back to purpose. What is that visualization for? Does that animation add anything? Does it make my graphic more understandable or more attractive, without making it less understandable? Then by all means include it. But if I included that feature animation interaction, you made your graphic more distracting or less understandable, because you clutter the presentation. For instance, You should not include it.
Alexander: I think it’s how I like to see it from this design standpoint that also Cole talks about in her book. It’s kind of like taking away as much as possible. So I would do it the same way with the, with the animation. Can I take it away and it still conveys the message in a very powerful way?
Alberto: I’m going to interrupt you in there though, and to say that I am also an advocate for that is strategy. However, I also think that we can go too far in that strategy.
Alberto: And that may lead us to graphics that are Spartan. So to speak or suit Bear right? There is value sometimes to adding a little bit more color, a little bit more of illustration, a little bit more of visual appeal to a graphic, just for the sake of visual appeal, right? So something shows that we may include in the graphic, may not have a purpose related to making the graphic more understandable. But there may be a purpose in making the graphic more attractive to certain types of readers, right?
Alexander: I have a nice example for that as you talked about the flattening of the curve graphic. I’ve seen where people put two cats.
Alberto: Yeah, exactly two cats.
Alexander: They were saying, you know, people don’t look at graphs, but they look at cat videos.
Alberto: That’s a little bit too much for me but, I saw another version where someone combined the flatten and the curve graphic with two cartoon characters, talking to each other about the graphic, right? So that is fun. Right? It’s colorful. It adds color, but at the same time it makes the graphic warmer, so to speak, more humane. Sometimes, the problem with data visualization is that it looks so geometric. It looks so mathematically oriented. Theory is boring so to speak. It is not boring for me, I love data visualization is not boring for you either, right? Because we deal with all this stuff every single day, but for many people statistics are sometimes a little bit threatening, or they look incomprehensible by adding touches of humanity to our graphics. Sometimes a little illustration that doesn’t obscure the data in one corner of the graphic. Adding something to the graphic itself, such as an explanation of how to read the graphic, for instance, adding a human face to the data. I’m thinking now about, for instance, Professor Hans Rosling.
Alexander: Oh, yeah.
Alberto: So as long as I’m talking to Statisticians here, we cannot avoid mentioning Rosling’s work. So Professor Rosling was a professor of International Health in Sweden. He was a statistician and also a medical doctor and he became incredibly famous in the mid-2000s because of the way he presented data, because he designed visualizations with the help of his son and his daughter-in-law, but at the same time, he didn’t just show the visualisations. Here is my graphic. He put himself in front of the visualization and added these human layers that I’m talking about. We call these in visualization, by the way, The Annotation Layer, so he provided verbal explanations of what the graphic was showing, how to interpret it. The main facts, that you should not miss in the graphic, take a look at these. Take a look at that. And he did that with a humorous tone sometimes, and also with a very warm voice and he was a warm person or at least a warm presenter, a friendly looking for Presenter.
That’s what I mean by adding the human factor to any visualization whenever it is appropriate. And whenever it is possible.
Alexander: I think I still have this video that he did with the BBC about life expectancy, and..
Alexander: Yeah, it’s so powerful and it’s so nice to watch in this Loft setting that he gets into and then he explains very nicely as a graph. It’s one of the examples that inspired me to actually do much more in this visualization area and I hope it inspires lots of others.
Alberto: He also inspired me.
Alberto: Yeah, back in 2005 and 2006when he became so popular.
Alexander: Yeah, and nowadays we can help you to use these things as well, so we have tools like these free libraries. We have many other tools, we have lots of options in our webinars. We have lots of options in SAS, which are mostly used within the Pharma industry. We have tools like Tableau. We have tools like clixsense and lots of other areas that we can use to better visualize our data in a straightforward way and it’s not like, you know, maybe 20 years ago where there was really a lot of hassle and time to actually do it. There is much more available now, and I think people should embark on it and consciously explains the data in a better way, because I think if you, if we as Statisticians know our data inside out, we also have some responsibilities to explain the data in the best way, so that people take the key learnings with all the strengths and limitations from the data. Rather than just providing the tables to someone that you know, further down It’s process, sets up some sets up some visualizations. Not knowing about these strengths and limitations anymore. And being quite far away from the original creation of the results.
Alberto: It is so easy nowadays, right? As you’re saying, it’s relatively easy nowadays to basically start doing this type of design. Thanks to all these tools are available nowadays. I also use our and I also, but I use many other tools. Not only that, I usually use Adobe Illustrator, which is that designed tool that every graphic designer uses. I use Data Wrapper, which is a premium tool that was created in Germany by the way. Data Wrapper is a great interactive data visualization tool. I use Flourish, which is another premium tool that lets you create interactive data visualization for the web quite easily. There is Tableau, there is Power BI, there is a Jump from SAS. There’s so many options nowadays, and some of them are better than others by doing certain things. But any of them can open the door to anybody to get started doing this kind of work a little bit more.
I would say though, I would like to warn your listeners about something, though. When I advocate for everyone and anyone to start doing database organizations. I don’t mean that everyone needs to become an expert in data visualization. Right? What I do mean is that everybody can benefit from learning from the elementary level right? From learning the basics of how to create a good chart or a good map. That’s something that can be learned and can be learned relatively quickly, but there is also a place for specialization. So in any statistical team, I think that there should still be people who specialize in knowing the data really well and analyzing it and understanding it. So their specialization there, the same way that there can be a specialization, you know, among Statisticians people who don’t analyze the data so often, but the specialized in communicating the data, right? So those two are possible specializations or maybe more. But anybody and everybody can benefit from learning the basics of all that?
Alexander: That’s actually a very good point. There’s a couple of community projects going on where you can learn about becoming a better visualization expert like Makeover Monday, Tidy Tuesday, and Wonderful Wednesdays, the PSI by special interest groups recently started.
Alberto: Yeah. Sorry. There’s also the data visualization Society.
Alberto: They also have a forum and you can share your work, and, you know, learn from others and ask questions to the community. The visualization Community or Communities, they are quite similar to the Our Community, in the sense that it’s quite open, It’s quite horizontal, and in general, people tend to be very helpful and nice to each other. So if you ask openly oh, I don’t know how to do this in our or whatever it is very likely that someone in the Our Community will have an answer for you and they will send you some, you know, some recommendations of how to do something. The visualization Community or communities because there may be several ones. They are also quite open and quite helpful. All right, so you share stuff. For example, in the data visualization Society forums. It’s very likely that people will reply with constructive suggestions and ideas, and tutorials, and resources that you can use. So that also makes learning all this stuff a little bit easier and more exciting.
Alexander: Yeah. In terms of getting to a good visualization. What does your personal process for that look like? So, do you dive directly into our to create something or how do you know?
Alberto: No, that comes much later. So, the first step is and these steps will sound a little bit obvious. But I think that it may still be helpful to lay them out clearly. So the first step is understanding the numbers obviously, and that’s something that I have learned the hard way not to do on my own because again, I’m not a statistician, I’m not an expert or anything. So what I always do or always try to do is to communicate with sources. So actually this is a piece of advice that I gave publicly recently. When talking about visualization about the coronavirus spread, I said, you know my blog. If you want to design anything related to the coronavirus, don’t just assume that you can download the data from a public source and visualize it away. Don’t do that. I mean you can visualize it, but don’t publish it right.
If you are going to publish it publicly. First of all consult with Epidemiologists, ask Epidemiologists about the graphic and about the data that you’re using and make sure that whatever it is that you’re visualizing has been vetted by experts before you publish it because otherwise, you know, you probably will make a mistake. And so, that’s a recommendation that I always give journalists to always partner up. Anything that I do, anything that I write, anything that I visualize particularly, if what I write about or what I visualize, may have consequences for people’s lives, is always vetted by experts just in case. And I still, even though I still make mistakes sometimes even following that process.
All right, so once we have understood the data and you know what the data is hiding, the main patterns or trends or whatever that are worth highlighting, the main insights that are worth highlighting, the main stories, quotation marks in there that hide behind the data.
Once you know those, it is when you start thinking about the visualization, it is when you start thinking about how I should encode my data? How should I transform my quantities into graphic forms? It May take a few hours to explain these, so I’m not going to do that. I will just refer to the books that I mentioned before to talk about how to choose encoding. So when should I use Line Charts or paragraphs or Pie Charts or Maps, Etc. There are several techniques that can be used for that one of the most common mistakes, by the way, that beginners make is to work in autopilot.
For instance. I have several examples of that. I know that if I give my students a data set that contains geographical variables such as countries, counties, regions, or whatever of the world. And give them a data set with obesity rates, income levels, whatever. But one of the geographical variables is locations. And I tell them design, a visualization, the first visualization that they are going to decide right away is a map right? Because the data sets contained are geographical variables. And when they showed me the example, I asked them. Why is it a map, right? Why is it a map? Because as I said before, visualization is purpose-driven. So what’s the purpose of your visualization to show Geographic patterns in the data? If that is the case, then a map is the right choice.
But what about if the purpose of your graphic, or your resolution is not to show Geographic patterns? What about if the goal of your visualization is to compare the different regions, or to rank the different regions or to do something else with its data? Then the map will not be the solution. It’s another way of coding the way they should show the data is through a bar graph or through a table, or through a line chart. This is why visualization is always purpose-driven before you design anything, write down what you want your visualization to communicate.
So visualization usually begins not by drawing anything. It begins with writing, right? Writing the purposes of your graphic, writing a short.
Alberto: Writing a short description of the main takeaways that you want people to get from your visualization based on that. Is when you can make design decisions, when you can make choices about encodings, when you can make choices about arrangement, about layout, about the structure and so on and so forth, and about colors and about style. And it all boils down to purpose.
Alexander: I think that is a really important point that many leave out and that then leads to lots of frustration down streams. Because if you don’t have a clear purpose that you want to tell, a clear take away then you never know whether you’re heading to the right graphic. And if you work in a team, everybody has different goals that they want to communicate. Let’s say one person wants to communicate so fast on medication. Another one wants to communicate that it has long stable maintenance and yet another one wants to communicate that the response is consistent across different subgroups. All of these three things, you would come up with very different visualizations.
Alberto: And they may be complementary to each other. In a case like that.
Alberto: It may be worth thinking about not just one single graphic because it’s impossible to communicate that throughout a single visualization. But about several visualizations that are complementary to each other.
Alexander: But you should first do the first one and you get feedback and then you go to the next one and you get contradicting feedback. And then you go to the third one and you get that other feedback and because you haven’t clearly defined and clearly also agreed within your team, what the goal is.
Alberto: The goal of the graphic.
Alberto: And when we talk about the main takeaways, I don’t really mean that you need to provide a single-story. The takeaway could be what the task is, what is the task that the visualization enables? That’s the takeaways. What does the visualization let you do with the data? Lets you see in the data? It could be the ups and downs in a time series. In which case you’ll need a line graph and so on and so forth. So you don’t need to make an explicit point in your visualization, the takeaway of your graphic could be just enabling a task in something in the data.
Alexander: Yeah. That’s a very important Point. Okay, once you have that done, what do you do next in your workflow?
Alberto: Well, depending on how large the project is. I also think about creating several Graphics, not just one depending on the complexity of the messages that you’re going to communicate. So, you know, you may want to think about how to build a narrative. Based on graphics and words, right? Some people like to use the term Story Building. A story is a term that we use all the time in journalism. Let’s build this story, right? When I, when we talk about stories, obviously we don’t talk about fictional stories, right? Things that are made up, we are talking about narrative. So how do you build a narrative that may help readers make sense of the data? So I started thinking about how to arrange several graphics in a layout, or how to arrange them sequentially, and linearly. First, second, third, Etc, and how they connect to each other in a narrative way.
I also think about the notation that I can add to the graphics to put the data in context. I am thinking about a methodology section. Sometimes it is worth even communicating with the general public. I also think that is worth adding a short methodology section at the end of a news story that uses data. This is something that news organizations are already doing, 538, ProPublica for instance, in the United States, whenever they deal with large datasets, they always include a methodology section in the news stories, like if you were a scientific paper. Only that it’s not as detailed obviously and as technical as if it were a scientific paper, every follows sort of like the same idea as a scientific paper. Let’s be transparent about the data and how we have manipulated the data explored today. So I think that that is worth doing.
Basically the narrative is the next step and it is worth also sometimes if possible testing your graphic, so it’s like showing your graphic to people. This can be done scientifically, or non scientifically or systematically, or non systematically, so to speak. But I think that both of them are valuable. So if you can do it systematically, right? Depending on how ambitious your project is, it may be worth putting together a survey or putting together a formal focus group, Etc, to show your graphic to people who you believe are representative of your readership just to get the reactions. But if you cannot do it formally just showing it to friends, family acquaintances and letting them read the graphic. And then without guiding them, just show them the graphic and say, you know, “This graphic is about such and such and now read it. Let’s talk five minutes from now and tell me what you learned from the graphic” right? Just having that conversation as unscientific and unstructured as it sounds, it can still be extremely valuable. Because it can show you what an average person, like myself, may get right? Or make it wrong from the graphic that you are designing. So testing is also part of that process.
Alexander: Yeah, that’s a very important point. I’ve done that in the past as well, where I’ve shown graphics to Physicians and I’m remembering where it was about Psoriasis showing psoriasis data to Dermatologists. They were directly jumping at the colors and saying, oh, yeah. The patients need to go from red to green because it’s not like the traffic light. It’s actually because red for, if you talk about psoriasis, is associated with the red scaly plaques on the skin, that’s Psoriasis.
Alberto: Yeah, they are associated.
Alexander: And green kind of makes people think of clear skin. And these things you only know when you actually talk to your readers because red might mean something completely different when you talk, for example, to Cardiologists or to other Physicians. So these associations you only get through the co-creation. And of course, you can check how fast people can really take it.
Albero: Yeah, one thing that I’m connecting to what you’re saying. So I’ve been working on visualizations of different kinds. Again, Infographic, Visual Explanations, Data Visualizations, Solutions Transformation for more than 20 years now. And being so experienced, one thing that I have learned the hard way is to be much more humble about my own assumptions. At the beginning of my career I was full of hubris. I knew everything, I knew how to communicate everything, people will understand this graphic, etc. Etc.
But throughout the years, I learned that I usually like to repeat this sentence in my talks, what you design is not what people see. And the only way you can make sure that what you decide is what people see is if you talk to people, did you talk to the people who are going to consume your graphic. Now that doesn’t mean, and I also warned people about these in talks and in my books that you need to oversimplify. You need to oversimplify your graphics to make them more understandable, right? Sometimes the solution to a misunderstanding on the part of the reader, is not to change the graphic that you are designing but to include an explanation of how to read the graphic. And the reason why I’m saying this is coming from the news media World.
Many years ago, I tried to use a scatter plot in a story, right? And a scatter plot, which is a superb traditional way to represent data. Anybody in statistics knows how to use a scatter plot, but 10 or 15 years ago Scatter Plots were nowhere to be seen in News Publications. And the reason why that happened is that news editors were very reluctant to use that graphic form because they said, a very good argument said, people will not understand this graphic. Right? So let’s not publish a scatterplot just because readers will not understand it. There is wisdom in that decision, but at the same time it is also self-defeating. Because my argument has always been, if you never show a scatter plot to people. How are people going to learn how to use a scatter plot?
Alberto: It’s like, it’s a self-fulfilling prophecy. So, if you believe that the graphic form or the visualization that you are using is the best way to represent your data and you haven’t found a better way. Don’t let it get stuck in the fact that people don’t understand it. If that is the best way, according to your own judgment, to represent the data, use that graphic, but then be prepared to explain how to read it. This is why I think again going back to Hans Rosling, Rosling also showed the way in the BBC video that you mentioned, if you remember in that short snippet from his documentary, The Joy of Stat. The first thing that he does is not to show data. The first thing that he does is to explain how to read a scatter plot. Here is the x axis, here is a y-axis, here are a bunch of bubbles, each one of the bubbles represents our country. The size of the bubbles is proportional to the population. He was explaining the grammar of the graphic before he showed any data and then he chose this kind of plug, right? That’s how you increase Visual Literacy, which is something that I really, really care about increasing visual literacy among the general public.
So that’s public service, right? So again, don’t refrain from using a complex visualization just because you know that people will not understand it. First of all, try to find better ways to represent the data, but if after trying different things, you still think that your original graphic is the best way to represent the data, use it, but then explain it.
Alexander: Awesome! That is a very good summary actually. Also, in terms of the key takeaways from today, we need to constantly improve our skill sets in terms of becoming better in explaining things. We need to test what we are doing with others. We need to invest time to make sure that the goals that we want to communicate with this visualization are actually met and that we work together with others to make sure that we reach this goal. Thanks so much for this awesome interview. Is there any final takeaway that you would like to say to our listeners?
Alberto: I actually have another, I have another final takeaway. Have fun. Visualization is not only informative, powerful and useful, it’s also fun. It’s fun to create Maps, it’s fun to create graphs. It’s fun to create information graphics and visual explanations. So, you know, learn to enjoy what you are doing. And you will be a better professional because of that. If you enjoy what you do, right?So that will be another final piece of advice for everybody and thanks again for having me. This has been a lot of fun.
Alexander: Thanks so much.
This show was created in association with PSI, thanks to Reine who helps us with the show in the background and thank you for listening. And please visit theeffectivestatistician.com to find all the show notes or the preferences that we discussed and learn more about our podcast to boost your career as a statistician in the health sector. And please tell your colleagues about it. If you like this podcast, spread the word so more can benefit from it. And like always, reach our potential, lead great signs and serve patients, just be an Effective Statistician.