Visual Analytics in Effects of Gross Domestic Product to Human Immunodeficiency Virus Using Tableau

As data becomes more accessible, visualization methods are needed to help make it easier to understand the information. Analyzing and visualizing data makes it easier to understand a dataset without having to read through it, and elucidate connections between two or more different datasets. Tableau is one of the most popular interactive data visualization software. By using Tableau, it is easy to find correlations between datasets, reorganize datasets through pivoting or joining them, and create visualizations such as geo map charts, geo bubble charts, table charts, line charts, pie charts, and treemap charts. This project aims to show the correlation between a country’s gross domestic product (GDP) and human immunodeficiency virus (HIV) through Tableau. Large data sets related to the GDP and HIV were gathered from open data sources. The data will be cleaned through Tableau and Excel, and correlations between datasets will be shown through variable charts with Tableau.


I. INTRODUCTION
A lack of socioeconomic resources can be linked to riskier health behaviors, which can lead to the contraction of human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (aids). These behaviors include substance use, which can reduce the likelihood of using condoms. Additionally, living in poverty can result in food insufficiency, which can contribute to HIV infection. According to research published in the national center for biotechnology information, "food insecurity, defined as persistent lack of access to adequate food in needed quantity and quality, undernutrition, and HIV/AIDS are overlapping and have additive effects. Over 800 million people worldwide are chronically undernourished and over 33 million are living with HIV infection" [1].
HIV is an epidemic in certain poor urban areas across the United States. The Joint United Nations Programme on HIV and AIDS (UNAIDS) identified general populations with more than 1 percent prevalence of HIV [2]. Within the same neighborhood in those cities, the people below the poverty line have a two times higher chance of being infected than people with incomes above the poverty line. This is not new information, since most health officials have long believed that poverty is a key driver of HIV. However, there were not many large projects to support this belief with factual data. There may have been suspicion in the past if infection rates were correlated with racial and genetic makeup. However, this belief has been disproven as studies have shown that there were no statistically significant differences between infection rates by race or ethnicity [3]. Thus, people of color are affected by the aforementioned diseases because they are more likely to be impoverished, and not because of their race or ethnicity. According to the Black AIDS Institute, "when other racial ethnic groups face the same social determinants of health as Blacks the social and economic conditions within which they live and that impact their well-being. Their HIV rates rise to similar levels as those of Blacks, even for Whites, whose rate of infection is normally substantially lower than rates for both Blacks and Latinos." [4] Recently published in journal Lancet, psychiatrist Wayne Fenton analyzed factors causing HIV and AIDS and concluded that reducing poverty may be the only viable long-term response to the epidemic. This research has been augmented by recent studies based on statistical correlations of epidemiological and socioeconomic data. Studies have correlated the prevalence of the HIV infection directly with wealth. According to Dr. Chin who helped analyze data from Kenya, the "national HIV prevalence rates appeared to correlate directly with national income across sub-Saharan Africa" [5]. More recently, Mishra et al. analyzed the HIV infection prevalence in relation to a person's socio-economic status with national surveyed data for eight African countries. They concluded that there was an association between socio-economic status and the prevalence of the infection [6].
An impoverished community can face many challenges when living with HIV. Unstable housing, food deficiency, and a lack of consistent access to quality health care can make it difficult to cope with everyday life [7]. The cost of life management antiretroviral drugs can range from $10,000 to $15,000 per patient per year [8]. Therefore, the support that an HIV patient receives is dependent on the economy of their particular country; a poor country means poor remedy.

II. METHOD
Data was retrieved from three online open data sources: World Health Organization, The World Bank Open Data, Our World in Data, and World Population Review. The datasets for GDP were retrieved from the World Bank [9], the datasets for HIV and AIDs were retrieved from Our World in Data [10], and the datasets for the population of the world by country was retrieved from World Population Review [11]. Each dataset was either in csv or Excel format.

Visual Analytics in Effects of Gross Domestic Product to Human Immunodeficiency Virus Using Tableau Eunbi Kim and Ching-Yu Huang
A. Cleaning the Datasets After all the datasets were gathered, the datasets were opened with either Excel or Tableau to check if the datasets were easily understandable. For the most part, the csv files or Excel files were easily interpretable; they may have included a short explanation, title, header, or a null value for data that was not collected. It is worth noting that some of the data points required cleaning; for instance, the terms "US" and "United States" referred to the same country, or "South Korea" and "Republic of Korea". Additionally, Tableau sometimes cannot recognize if a dataset has header or not, or if the data is related to each column.    Therefore, the datasets must be cleaned, and the country's names must be standardized before one can insert it into Tableau. Additionally, if a country's name has been changed before, Tableau will skip the country when it tries join the data with other datasets. Fig. 1 shows the data before it was cleaned. Since Tableau cannot recognize if it is a header or not, there are many gaps and formatting errors. Fig. 2 shows the result after cleaning up the data. Fig. 3 shows a map of the data before it was processed, including missing countries for those that have undergone name changes. Fig. 4 is a world map after standardizing all country names.

B. Joining Datasets
After cleaning up all the datasets, the next step was to join the tables to find a correlation between the datasets. An inner join is one such type of join that returns all rows from both participating tables where the key record of one table is equal to the key record of another. This join requires a comparison operator to match rows from the participating tables based on a common field or column in both tables. Fig. 5 shows inner joining the three tables by country and year.

C. Choosing Data for Visualization
The Tableau software provides data visualization for multiple datasets. After joining the tables, one can use the software to view the data. On the left side of Tableau, there are dimensions and measures to choose which data the user wants to see, as shown in Fig. 6. If the user clicks on some data, Tableau automatically shows which charts are available for the selected section on the right hand side, shown in Fig. 7. To use geographical data, it must to include country or specific latitude or longitude.

III. EXPERIMENT RESULTS
The final data set consists of 2987 rows. Each country contains data from 2000 to 2017, and the countries span 6 continents. The HIV/AIDS datasets include deaths from HIV/AIDS, new infection of HIV/AIDS, and number of people living with HIV. To see the results for HIV/AIDS and GDP in the map chart, one must select the country from the dimensions and select the number of people living with HIV/AIDS and GDP. The circle size represents the prevalence of HIV/AIDS and the color represents the GDP of the country. A bigger circle size represents a higher number of HIV/AIDS, and a darker color represents a higher GDP (Fig. 8).

A. Visualization in GDP and HIV/AIDS
Tableau also supports line charts. A line chart is commonly used to display change over time as a series of data points connected by a straight line over two axes. The line chart therefore helps to determine the relationship between two sets of values, with one data set being dependent on the other set. For this dataset, the independent variable is the range of years from 2000 to 2017, while the dependent variable is the GDP of a continent. Fig. 9 compares GDP with HIV and AIDS and as the graph represents, Africa has the lowest GDP but the highest number of cases of HIV and AIDS.
Tableau allows a user to filter for a specific continent or country. Fig. 10 is filtered for Africa. As GDP increases, the rate of new infections drops. However, some other countries do not exhibit a similar effect on infection rate from a similar increase in their GDP. For example, Asia's GDP increases, but the infectious rate does not drop as much as the increase in GDP. Also, the correlation is not perfectly negative.     According to this chart, GDP has an affect on HIV and AIDS; however, GDP is not the only reason for the reduction of the new infections. There are other issues have can an effect on which the rate of infections increases or decreases, such as a country's therapeutic support in the prevention of the disease. GDP affects a country's economy, which in turn has an effect on their ability to support patients and mitigate the spreading of the disease. From the data of each country's therapy rates, Africa increases their therapy support at the same rate that their GDP increases.

IV. CONCLUSION
The charts and maps that were created using Tableau show that not only does GDP have an effect on the rate of HIV and AIDS cases, but it also impacts a country's ability to care about advancement and treatment for the disease. Fig.  9a and 9b show that all continents are affected by their GDP, especially in the case of Africa where the number of HIV and AIDS cases were visibly reduced. The line chart, in Fig.  10, shows there was a positive correlation between a decrease in the number of infections and an increase in GDP. Fig. 11 is an example of one of the countries in Africa where one can see the correlation, and it shows that there is a decrease in the number of cases when the GDP begins to rise. However, there are also some exceptions to the rule. Fig. 12 is a line chart for Asia. Asia shows big increase in GDP, but the number of HIV and AIDS cases didn't drop as much in relation to their GDP growth. Also, Fig. 13 is a correlation trend line between the GDP of China and the number of infections. China, the biggest country in Asia, shows a downward trend, but not as much when compared to Africa. Many similar countries with a large population show just a slight downward trend when comparing their GDP to their cases of HIV and AIDS.
In June 1981, the United States received their first report of AIDS. Soon thereafter, the number of cases and deaths among people with AIDS increased rapidly during the 1980s, followed by a substantial decline in new cases and deaths in the late 1990s. The CDC analyzed the reported cases from 1981 through 2000 and they found that the number of cases rapidly declined starting in 1990. They also noted no relation between infection rates and race, age, and gender. Also, many of the infections were caused from misuse of needles and cross contamination of blood. At that time, it was not known that infected blood could transmit HIV, which caused a rapid increase in the infection rates. According to the CDC, "in 1985, the first federal resources dedicated to HIV prevention were made available to all state and local health departments nationwide. In 1987, a national effort to educate the public about HIV and AIDS was launched and the CDC created a comprehensive AIDS information resource, the CDC's national AIDS hotline and National AIDS Information Clearinghouse. Comprehensive school-based HIV education to inform and educated young people began in 1987, and funding for national, regional, and community-based organizations began in 1988" [12].
From this experience in the US, it shows that a government's efforts to prevent spreading and provide therapy is effective in reducing HIV and AIDS. Fig. 14 and Fig. 15 show that when Africa's GDP increased, their ability to provide therapy increased as well. However, Asia didn't increase their therapy at the same rate as Africa; thus, their infection rate did not see as much of a reduction.
A conclusion can be made from this study, that a country's GDP has some effect on the rate of HIV and AIDS transmission. This is because when their GDP increases, the country can prevent, or at least minimalize, the rate at which the disease spreads through therapy and education about the virus. The initial expectation of the study was that only GDP has an effect on HIV and AIDS; however, the final outcome shows that what a country does with their GDP is also a determining factor in the trend of the infectious disease.
Eunbi Kim is a former student of Kean University where she graduated Magna Cum Laude with a degree in computer science in May of 2020. Before studying at Kean University, she was living in her home country of South Korea.
During her time at Kean University, she was the treasurer of the Association for Computing Machinery Women. She was accepted into and participated in the IEEE MIT undergraduate research technology conference and she was accepted into the Council on Undergraduate Research conference in order to present her sentiment analysis.
Based on her academic excellence, she was accepted into the mentorship program as a tutor and advisor for the computer science department. She also became the recipient of the international student scholarship for the 2019-2020 academic school year.