1.0 Introduction   Background  TextMining refers to the discovery of knowledge from textual data. Textual datacontains abundant qualitative information that is difficult to use instatistical modeling. (Ghosh, Roy, & Bandyopadhyay, 2012). Text mining enables theconversion of text into numeric formats that can be easily used for analysis. According to (Ghosh et al., 2012), most information (over 80%) is currently stored as textthus a  need for text mining techniques.

 Infields of healthcare, physicians express patient medical records, opinions andfindings in terms of words that contain useful information that can be utilizedto improve the quality of healthcare (Raja, Mitchell, Day, & Hardin, 2008).  The healthcare environment is generallyperceived as being ‘information rich’ yet ‘knowledge poor’. (Rani & Govrdhan, n.d). AS1 According to (Belle et al.

, 2015), Most of healthcare data / clinical information containedin patient medical records is presented in narrative and unstructured formatsmaking it difficult for analysis, as a result, healthcare data is often timesnot available for analysis. There is a lack of effective analysis tools todiscover hidden relationships and trends in this data (Rani & Govrdhan, 2010). As a result, this information is not easily accessed by humans  to improve healthcare, teaching or researchpurposes. Accordingto (Aggarwal, 2012), textual data requires advancedalgorithmic text mining tools and techniques that can be utilized to discoverhidden relationships and learn interesting patterns from this data in a dynamicand scalable way. The question is whether healthcare systems in low resourcesettings can utilize text mining algorithms to make use of the vast amount ofhealth care data generated to improve  healthcare. Amidstthe huge amounts of healthcare data collected is the data about HIV/ AIDS.HIV/AIDS is  still a major global health concernespecially in sub Saharan Africa. According to the Centers for Disease Controland Prevention (CDC) report (2016), In 2016, 36.

7 million people worldwide wereestimated to be living with the  diseasewhile 1.8 million people become newly infected with HIV. (https://aidsinfo.nih.gov/understanding-hiv-aids/hiv-aids-awareness-days/173/world-aids-day). According to The Joint UN programmeon HIV/AIDS (UNAIDS) report 2016 ,  1.1million people died from AIDS –related illnesses worldwide .

In Uganda,according to report by Ministry of health, 31,000 people died from HIV related deaths in 2014, AIDS-related deathswere 67,000 in 2010 and over 75,000 in the late 1980s and early 1990s.  (https://reliefweb.int/report/uganda/hiv-prevalence-general-population) A study by (John,2015)AS2 shows that different people from  HIV/AIDS population are usually  seeking out for information in support of self-care, treatment and prevention of thedisease. However, little work has been done to closely examine the information gapsand information-seeking behaviors of these people especially  in low resource countries like in Sub SaharanAfrica. There have been previous studies to examine information needs andbehaviors , however, most of these have been done in more developed countriesmainly using manual methods of content analysis. Having a better understandingof the information needs and information-seeking behaviors of individuals inregards to HIV/AIDS  will provideguidelines which if utilized will facilitate information interventions thatwill bridge the current knowledge gaps for a better health care system inUganda. .

The Medical ConciergeGroup (TMCG), in Uganda (the test bed for this study) runs a 24/7 free medicalcall center. The organization has so far recruited over 1,000 participants onan mobile health study. This includes both HIV positive and negativeparticipants. The study is currently running in over 85 districts in thecountry. Once recruites,  Participantsare given a toll free telephone number which they can use to contact  qualified medical personnel regarding anyhealth issues especially those related to HIV/AIDS at any time of the day.

Detailsof the consultations done and questions asked  are recorded by  medical personnel in database system calledAsterisks. This data, however is recorded in an unstructured text format. Themedical call center contains a vast amount of data that has never been utilizeddueto the unstructured format of the questions asked. This information could berepresentative of the HIV/AIDS information needs in Uganda.  Unfortunately Little is known about whatthese participants tend to ask or inquire about in regards to their health. This study aims at designing asupervised machine learning text mining method that will  perform an HIV/AIDS question analysis to generateinformation on the  HIV information needsand information seeking behaviors of people on an HIV mobile healthintervention at TMCG (Uganda)1.

1 Research ProblemAccording to (Belle et al., 2015), Most of the healthcare data / clinical informationcontained in patient medical records such as medical history, consultationnotes and findings is presented in narrative and unstructured formats  which make it very difficult for analysis. Asa result, healthcare data is rarely utilized in supporting clinical decisions,teaching, research purposes or to improve health care.

Furthermore, there is alack of effective analysis tools to discover hidden relationships and trends inthis data for proper utilization. This study aims at employing text miningmethods to explore patterns and trends in the information seeking needs andbehaviors for HIV/AIDS populations in low resource settings such as Uganda. HIV/AIDS populations have varying needs ((Moradi, Mohraz, & Gouya,2014)Most prominent of which is the information need. According to (John, 2015) ,a thorough understanding of user information needs and behaviour is fundamentalto successful information services. These seek information regarding prevention, treatment and self care of HIV/AIDS in relationto healthcare services provided . These information needs also vary basedon several social demographic factors such as age, sex and economic status. forexample those who have just found out about their positive HIV status havedifferent needs such as how to access ARV’s while the youth might desire to learnabout HIV prevention.

Women might also have varying needs compared to men. Athorough understanding of these information needs is very imperative if rightinformation interventions such as Instant Voice Responses (IVR) for providingSelf Help information is to be implemented. In Uganda, little has been done toexplore and understand these various information needs.  This study aims atdesigning a  text mining algorithm thatwill be used to analyze Health care data. This will be archived throughdesigning a supervised machine learning Algorithm that will be used to classify/ categorize HIV/AIDS information needs. The study will help in analyzingHIV/AIDS related questions from individuals on a mobile health care centersystem to identify information needs.

Understanding these information needs willprovide guidelines the will aid the designing of better informationinterventions in the country to improve the health care system. AS1Ithink make these literature review Italic so that the reader know that these arreferences and not part of the sentence AS2Johnwho add something else that is abit unique.

