Skip to main content

Virtual Networking Exchange
31st of July 2024

DS-I Africa CC

DS-I Africa Coordinating Center

The Open Data Science Platform (ODSP) and Coordinating Center (CC) for DS-I Africa is led by a well-established partnership between Prof Mulder (current PI H3ABioNet) and Dr Michelle Skelton (current Coordinator, H3Africa CC) hosted within the Computational Biology Division and IDM at the University of Cape Town (UCT). Using 8 years of experience in leading a pan African informatics consortium aimed at data and informatics resource and capacity development, Mulder will establish the ODSP gateway and its eLwazi platform, a flexible, scalable African open data science platform for depositing, sharing and accessing data, selecting data-specific tools and data science methods, and deploying tools and workflows on a choice of computing environments suited to the African context, facilitating novel discoveries for health. Skelton will use her 7 years of experience in coordinating the large H3Africa consortium to develop an efficient CC addressing all the joint administrative, collaborative and logistical needs of the consortium. Together, this formidable pair will transition their current teams and activities from H3Africa to DS-I Africa, wrapping up one and starting the new, complementary endeavour with highly experienced teams ready to provide immediate impact. They will support each other's activities, technically from the ODSP and operationally/logistically from the CC and co-develop an effective data science and professional development training program. ODSP will support the CC on website and tool development, while the CC will support ODSP in events planning, managing data deposition, access policies and agreements and integration with the DS-I Africa consortium. The outcomes will be: 1) an open, transparent and sustainable ODSP developed with user input to ensure their needs are met, accompanied by adequate user and administrative support; 2) access to a choice of public and private Cloud and local African computing facilities for data storage and analysis; 3) access to a comprehensive set of African and other relevant datasets, tools/workflows and resources required for the implementation of data science techniques to biomedical data; 4) a comprehensive data science and professional development training program; 5) consolidated consortium policies, documents and resources; 6) efficiently organized consortium activities, workshops and events; 7) avenues for exploration of new collaborative and translational ideas and partnerships with industry and other stakeholders. Collectively these provide the elements required for a fully supported cohesive consortium facilitating the application of data science to health. The ODSP and CC are African led, developed in Africa by Africans and for the benefit of African scientists and research participants.


Bridging Gaps in the ELSI of Data Science Health Research in Nigeria (BridgELSI)

Data science is poised to impact scientific research, innovation, discovery and healthcare in sub-Saharan Africa (SSA) because of the rapid growth of infrastructures such as cell phones and computers, and the availability of technologies like Artificial Intelligence. These methods and resources present huge opportunities to leapfrog current research, public health and clinical care in Africa by utilizing data science to address the huge burden of communicable and non-communicable diseases in SSA. Despite these promises, however, there are substantial concerns about the ethical, legal and social implications (ELSI) of data science research in SSA. These concerns arise from the use of conventional and unconventional data; the methods for generating, manipulating, storing, sharing, and utilizing data in data science; the limitations of current informed consent models in these scenarios and opportunities for novel strategies for legal oversight of the ELSI of data science research in Nigeria. In this collaborative project between the Center for Bioethics and Research (CBR), Nigeria, George Washington University, DC and University of Maryland School of Medicine (UMSOM), we will evaluate current legal instruments, guidelines and frameworks, and their implementation, and use these to develop new and innovative governance frameworks to support data science health research in Nigeria. We will also implement mixed research methods to prospectively evaluate the knowledge, attitude and practices (KAP) of data scientists and ethics committees to current and emerging ELSI of data science research in Nigeria. Given the novelty of data science in Nigeria, we will implement general and specific, short and medium-term training in ethics of data science research in Nigeria for data science researchers and an introduction to data science for members of ethics committees reviewing data science research projects.

DS-I Africa - Law

The DS-I Africa Law project focuses on the legal dimensions of using data science for health discovery and innovation in Africa. Accordingly, the project complements other ELSI projects that may focus more on the ethical and social implications. Data law is multi-layered, complex and differs between jurisdictions. Without knowledge of the law, scientists chance beginning a health research project only to later discover that the data that they have generated cannot be used as planned. This risks wasting resources and, in some cases, incurring legal liability and even criminal sanctions. The DS-I Africa Law project, therefore, provides scientists with the necessary guidance on how to be legally compliant. The project has a broad jurisdictional scope, involving the law of twelve African nations. Five critical legal themes are investigated: (1) modes of informed consent to the use of data; (2) the nature and content of individual and community rights in genomic data; (3) the use of persons' geospatial data for public health surveillance; (4) the cross-border sharing of data; and (5) the use of data as basis for Artificial Intelligence (AI). An important cross-cutting theme and goal is the decolonization and Africanization of extant law related to the use of data in health research. Thus, the project critically engages with the extant law of the twelve jurisdictions from the perspective of current trends in African legal philosophy. The project team consists of leading law academics highly experienced in the legal regulation of data.

Public Understanding of Big data in Genomics Medicine in Africa (PUBGEM-Africa)

With rapid advanced in genomics, computational sciences, and health informatics, the next decade will likely experience a rapid interest in big data use to inform genomics medicine. This is likely to raise the enthusiasm of patients, big data scientists and proponents of genomics medicine. On the hand, the use of big data for healthcare in general raises ethical legal and social issues (ELSIs) relating to the feedback of individual genetic results, the use of research participant data to inform clinical care or population health, risk of privacy breaches and a possible big data global divide. Much can be learned from the growing literature on ELSI of genomics research in Africa, to anticipate and possibly address some of the ethical issues linked to big data in genomics medicine. Given the dearth of empirical and normative ELSI analysis on big data in health in Africa, This UO1 application therefore seeks to develop an ELSI research project that will explore Public Understanding of Big data in Genomics Medicine in Africa (PUBGEM-Africa). In PUBGEM-Africa, we specifically aim to: 1) to investigate models of public engagement and preparedness for big data use in health; 2) to explore the roles and responsibilities of different stakeholder groups (data providers, data producers, data users, funders, and research ethics committees) regarding intellectual property, patents, and commercial use of genomics big data in health; 3) to investigate public perceptions of big data in health and attitude towards governance of data; 4) to develop data governance frameworks for big data-driven innovation and health in Africa. PUBGEM-Africa will directly explore these different questions with populations living with different genetic conditions namely sickle cell disease (high burden in Africa); non-inheritable hearing impairment (a vulnerable population) and Fragile X syndrome (a rare genetic condition). This will allow us to show case, first-hand, attitudes and perceptions to big data use by populations with a genetic condition and to also design systems for addressing the different ELSIs. PUBGEM-Africa will also the foundation for establishing a competitive centre of excellence in the ethics of data science for genomics medicine and emerging biotechnologies in Africa. Our team’s prior experience in implementing various H3Africa ELSI and genomic projects in Africa; the outstanding institutional environment at the University of Cape Town; long term collaboration with institutions in Ghana and Cameroon; and an established collaboration with patient groups (for hearing impairment, sickle cell disease, and Fragile X syndrome),puts us in a good position to effectively implement an ELSI study on big data in genomics medicine in Africa, but to also ensure long-term sustainability of ELSI activities on data science and health innovation in Africa.

Research for Ethical Data Science in Southern Africa (REDSSA)

The Research for Ethical Data Science in Southern Africa (REDSSA) project has the overall aims of producing new knowledge in regard to the ethical, legal and social implications (ELSI) of conducting data science research, to develop evidence-based, context specific guidance for the conduct and governance of data science initiatives such as DS-I Africa, and to strengthen the culture of responsible data science in Southern Africa. The project will be conducted in three phases. Phase 1 is research intensive and will obtain empirical data on key stakeholder views regarding the development of data science guidance to inform governance of DSI-Africa Research Hubs in Southern Africa. This phase will start with conceptual research and normative analysis of the ELSI issues related to data science. Important concepts to explore will include data sovereignty where data protection is balanced with responsible data sharing. Given that digital data is often experienced as intangible and abstract by the lay public, the project will employ crowdsourcing as a form of citizen science to inform the development of innovative educational tools that could be adapted for stakeholder engagement in data science in the DS-I Africa network. Using these tools, we will conduct in-depth interviews with key stakeholders to ascertain their experiences with ELSI-data science related challenges, gaps in current guidance and their views on procedural and substantive aspects of guidance development in data science governance. Key themes that emerge from the empirical research will underpin our approach to co-creation of guidelines, procedures and policies required in the DS-I Africa consortium. During phase 1, ethics consultants embedded in the Research Hubs will address emerging ELSI concerns. In Phase 2 of this project, we will develop guidance documents informed by phase 1 research and by best practices in international data science research guidance, the limited experience and existing literature to date concerning data science research and governance of data management in Southern Africa. Importantly, such guidance will be informed by the values of solidarity, sharing and mutual benefit - important concepts in Southern African moral frameworks based on communal good. This approach is congruent with health data ecosystems that require different stakeholders to work collaboratively for health innovation. The results of these policy-related activities will be tempered with key concerns and principles identified in our conceptual and empirical work and will provide locally grounded, practical guidance on the ELSI of data science research conducted in the hubs. In Phase 3 of the project, we aim to amplify the impact and enhance the sustainability of our research and governance activities by creating ELSI networks and communication channels focusing on data science in Southern Africa. This will involve establishing an ELSI Data Science Southern African Network (EDSSAN) to respond to evolving ELSI concerns in DS-I Africa Research Hubs beyond the funding period, hosting annual conferences, and leveraging existing local networks.

eLwazi ODSP

Open Data Science Platform

The Open Data Science Platform (ODSP) and Coordinating Center (CC) for DS-I Africa is led by a well-established partnership between Prof Mulder (current PI H3ABioNet) and Dr Michelle Skelton (current Coordinator, H3Africa CC) hosted within the Computational Biology Division and IDM at the University of Cape Town (UCT). Using 8 years of experience in leading a pan African informatics consortium aimed at data and informatics resource and capacity development, Mulder will establish the ODSP gateway and its eLwazi platform, a flexible, scalable African open data science platform for depositing, sharing and accessing data, selecting data-specific tools and data science methods, and deploying tools and workflows on a choice of computing environments suited to the African context, facilitating novel discoveries for health. Skelton will use her 7 years of experience in coordinating the large H3Africa consortium to develop an efficient CC addressing all the joint administrative, collaborative and logistical needs of the consortium. Together, this formidable pair will transition their current teams and activities from H3Africa to DS-I Africa, wrapping up one and starting the new, complementary endeavour with highly experienced teams ready to provide immediate impact. They will support each other's activities, technically from the ODSP and operationally/logistically from the CC and co-develop an effective data science and professional development training program. ODSP will support the CC on website and tool development, while the CC will support ODSP in events planning, managing data deposition, access policies and agreements and integration with the DS-I Africa consortium. The outcomes will be: 1) an open, transparent and sustainable ODSP developed with user input to ensure their needs are met, accompanied by adequate user and administrative support; 2) access to a choice of public and private Cloud and local African computing facilities for data storage and analysis; 3) access to a comprehensive set of African and other relevant datasets, tools/workflows and resources required for the implementation of data science techniques to biomedical data; 4) a comprehensive data science and professional development training program; 5) consolidated consortium policies, documents and resources; 6) efficiently organized consortium activities, workshops and events; 7) avenues for exploration of new collaborative and translational ideas and partnerships with industry and other stakeholders. Collectively these provide the elements required for a fully supported cohesive consortium facilitating the application of data science to health. The ODSP and CC are African led, developed in Africa by Africans and for the benefit of African scientists and research participants.

Research Education

Application of Data Science to Build Research Capacity in Zoonoses and Food-Borne Infections in West Africa

The importance of Health Data Science in Africa cannot be overemphasized, as the burden of infectious and non-infectious diseases is striking across the continent. Despite this, health data science in Africa is grossly underdeveloped, which is mainly due to lack of well-trained data scientists. In the last few years, there have been efforts to address the Data Science training needs in Africa. However, these remain limited in scope and some of the crucial infectious diseases such as zoonoses and food-borne infections, which take a significant toll on the continent, have not been sufficiently addressed. The aim of the proposed training programme is to enhance research into zoonoses and food-borne infections in West Africa through the application of data science. We are proposing three training tracks as follows: (I) To provide a one-year research training to MSc students in West Africa. In this training track, we will support training of excellent candidates who have completed the first-year course work of a relevant MSc programme. The candidates will undertake a research project on a topic that applies bioinformatics, phylodynamics, and/or disease modelling to zoonoses or foodborne infections for their dissertation. (ii) To provide a one-year research training to faculty members from West Africa. In this training track, we will support early career scientists/faculty from universities and research institutions who are interested in developing a research career in zoonoses or foodborne infections. They will apply bioinformatics, phylodynamics and disease modelling tools in research on zoonoses and food-borne infections. (iii) To organize workshops to provide hands-on training in bioinformatics, phylodynamics and disease modelling to wider communities in West Africa. The long-term goal is to establish a core of West African scientists who can carry out rigorous health research projects using data science.

Data Science and Medical Image Analysis Training for Improved Health Care Delivery in Nigeria (DATICAN)

Medical imaging allows visualisation and understanding of human tissue and internal organs, thereby providing useful information for diagnosis, treatment and patient management. There is a gross shortage of medical image analysts and data scientists in Nigeria. This negatively affects health care delivery. DATa Science and Medical Image Analysis Training for Improved Health CAre Delivery in Nigeria (DATICAN) is a collaborative effort among Lagos State University (LASU), University of Ibadan, and Redeemer University in Nigeria and the University of Chicago (UChicago) in the USA. DATICAN will leverage existing collaborative relationships between UChicago and LASU to provide early-career Nigerian physicians and scientists with a variety of mentored training and experiential learning opportunities. Our research education program focuses on (1) computer science and informatics (2) statistics and mathematics and (c) medical image analysis and public health, thus building capacity in Nigeria for data science and medical image analysis. LASU, ranked among the topmost universities in Nigeria due to its cross-cutting research programs and quality of research output, will be the coordinating center. UChicago, a world-class university known for its emphasis on interdisciplinary research and expertise in translational medicine and data science, will provide access to experienced faculty mentors and training resources. DATICAN will enroll 72 participants, comprising MSc (36), PhD (12), postdocs (12) and junior faculty (12). These graduate students and faculty will each design, develop, and implement a data science project. The projects will concern analysis and information extraction from medical imaging data (e.g., mammogram, ultrasound, MRI and X-ray images) particularly with reference to non-communicable and infectious diseases that are high-priority public health concerns in Nigeria, including cancer, stroke, brain tumor and epilepsy, chronic lung diseases, malaria and tuberculosis. DATICAN will provide support (school fees, stipend, research funds and grants for conference attendance) for program participants, for one to three years, depending on their needs. Participation will center on compulsory, two week-long "hackathons" to be held three times a year, but also include monthly webinar trainings focused on the latest technologies in data science, medical image analysis, image processing, current research in data science and health care related topics; also, funding opportunities, writing grant proposals and scientific articles, grants management, presentation skills, and career development. UChicago will lead the webinar trainings, providing faculty as well as access to technologies and cloud-based resources. We are confident that DATICAN will contribute to sustainable growth of the community of data science experts in Nigeria and sub-Saharan Africa, as the program's participants branch out and mentor others in their own research communities. Thus, substantial and productive benefits will accrue from our effort, manifesting as increasing numbers of Africans with the knowledge and skills to conduct data-intensive research that will improve the health of vulnerable populations in low resource regions around the globe.

Eneza Data Science: Enhancing Data Science Capability and Tools for Health in East Africa

The Research Training program "Eneza Data Science: Enhancing Data Science Capability and Tools for Health in East Africa" will build on existing and new partnerships in data science, enabling system thinking across disciplines in clinical practice, health research and policy in an integrative approach. ICIPE will lead the program together with Aga Khan University, with partners in Kenya (USIU Africa and Pwani universities), and internationally (including University of Michigan and Open Pharma Foundation) to enhance data science skills so that information in raw and often disparate data can be harnessed for healthcare improvement. The Eneza Data Science project aims to empower clinicians, health personnel, health researchers, and data science researchers with open science skills and tools to turn data into meaningful insights for better health, building on collaborations developed through the Human Heredity & Health in Africa, and the UZIMA DS-I Africa project hosted at Aga Khan University (Kenya) to 1) build a robust research and training platform, 2.) enhance data science knowledge, and 3.) build skills in data science in Kenya and the East Africa region.

Growing Data-science Research in Africa to Stimulate Progress (GRASP)

Brain health, which determines brain capital, is central to achieving overall health and all sustainable development goals. However, the current DS-I Africa programs do not have a dedicated training program in brain health research to unravel its determinants, especially sociodemographic and lifestyle factors which play a major role elsewhere. GRASP is aimed at developing a sustainable cohort of African scientists to tackle the brain health burden in Africa by improving the data science skill of selected scholars to unravel its determinants using available large integrated datasets within DSI-A network.

SYNthetic Healthcare DAta Platform for Data Science Training

This DS-I Africa UE5 application involves a partnership between University of Rwanda (UR), African Institute for Mathematical Sciences (AIMS), Washington University in St. Louis (WUSTL), and MDClone (software company that generates computationally-derived synthetic healthcare data), with participation from four DS-I Africa programs: 1) “Data Science for Health in Rwanda” U2R; 2) “Computational Omics and Biomedical Informatics Program (COBIP)” U2R, Univ Cape Town (UCT); 3) DS-I Law program; 4) Open Data Science Platform (ODSP) and Coordinating Center (CC). The overarching goal is to provide courses for skills development and mentored research projects using synthetic healthcare data sets generated by MDClone. The UE5 program objectives are: Objective 1. Training in the use of synthetic healthcare data. UR/AIMS/UCT program faculty will be trained in the use of the MDClone software through participation in short (1-2 day) beginner/advanced courses. Under peer mentoring from current MDClone ADAMS expert users at WUSTL and MDClone, they will gain hands-on training in the use of synthetic data. Subsequently training will be available to MS/PhD/postdoctoral/junior faculty trainees during all 3 years. Competencies using the software will be refined and prioritized with harmonization and standardization coordinated with other DS-I Africa U2R RTPs and Coordinating Center for wide dissemination during years 1-3. Objective 2. Deliver a mentored, hands-on, immersive, semester-long research project experience using synthetic healthcare data. The MDClone ADAMS software and underlying computationally-derived “synthetic” healthcare data at WUSTL will be made available for trainee use at UR/AIMS/UCT. Trainees will be paired with a faculty mentor and an expert MDClone ADAMS software user to complete a research project using the “Small Research Projects (SRP)” program funding for synthetic healthcare data research with a mentoring team. A scientific publication is expected from each trainee. Objective 3. Evaluate short-term training and semester-long research project. Trainees and faculty will evaluate the short-term synthetic healthcare data training program. Evaluations will be used to improve the program over time. Exploratory Objective. Determine dissemination and long-term sustainability for Rwanda and DS-I Africa consortium use of synthetic healthcare data. Over the course of the 3-year program and in coordination with the DS-I Africa ODSP/CC, we will conduct regular workshops, symposia and meetings with academic, governmental and non-governmental organization (NGS) leaders to discuss opportunities for MDClone software use at major healthcare organizations. These conversations will set the stage for widespread use of synthetic healthcare data in Rwanda and across DS-I Africa. Discussions with Rwandan Ministry of Health, Rwanda Biomedical Center, National Institute of Statistics of Rwanda (NISR), and National Council for Science and Technology (NCST) are ongoing to harness the power of MDClone ADAMS synthetic healthcare data to address challenges in Rwanda and the DS-I Africa Consortium.

West Africa Center of Excellence for Data Science Research Education

Sub-Saharan African countries are severely affected by the world’s most devastating infectious diseases, including malaria, HIV/AIDS, tuberculosis, neglected tropical diseases, and is experiencing an unprecedented increase in noncommunicable diseases, including cancer, cardiovascular diseases, and diabetes. In the past decades, public health, biomedical and clinical research have also increase significantly with support from the National Institutes of Health (NIH) and other partnership. This partnership has contributed a breadth of diverse, large, and complex clinical and biomedical data sets, providing unique opportunities for applying data sciences for discoveries that may catalyze innovation in diagnosis, and therapy of diseases of public health interest in the region. However, a critical gap is that these data remain under-exploited due to limited human resources with skills and expertise in data sciences to harness the value of these data to have a meaningful public health impact. The overall objective of this UE5 Research Education Program involving the University of Sciences, Techniques and Technology of Bamako (USTTB) and Gamal Abdel Nasser University of Conakry, Guinea (UGANC) with support from Tulane University is to build interdisciplinary teams across multiple African institutions capable of using innovative quantitative and analytical approaches to generate and apply new knowledge from large or complex sets of data in the subregion through the following specific aims: 1) Strengthen existing institutional research training programs to address the needs for optimal use and processing of large and complex data sets using advanced data science approaches. We will provide faculty enhancement training and develop advanced data sciences courses for enriching curricula of existing Masters and PhD programs and professional development short-term training (medical residents, public health, clinicians and junior researchers); 2) Develop and deliver datathon (i.e., data analysis hackathon) training that involves multidisciplinary collaboration among researchers, data engineers, machine-learning experts, statisticians, and other information scientists. Datathon participants will perform team-oriented approaches in response to research questions of interest in the area of infectious disease outcomes (e.g., malaria, NTDs, TB/HIV, or emerging infectious diseases) and develop solutions in group settings; and 3) Produce a critical mass of trained public health professionals, disease control managers, and researchers capable of working closely together to use data sciences to refine and guide control interventions most effectively through short courses sequence focused on hands-on advanced data science. Developped short-term training sessions and modules through a multifaceted approach to facilitate partnership among researchers and local faculty and stakeholders for future implementation of data sciences research.

Research Hub

Combatting AntiMicrobial Resistance in Africa Using Data Science (CAMRA)

Bacterial infections are highly prevalent and contribute significantly to morbidity and mortality across all age-groups but because appropriate microbiologic diagnostic services are nonexistent or very limited, ascertaining the specific etiologic agents and the true prevalence of antimicrobial resistance (AMR) is a major challenge in Africa, where objective data is limited.
Preliminary observations from Nigeria, the most populous country in Africa include the following:
1. high prevalence of Salmonella enterica serovar Typhi (S. Typhi) with 45% prevalence of multidrug resistance,
2. high prevalence of extended spectrum β-lactamase- producing Enterobacteriaceae (ESBL-E) blood stream infections,
3. maternal colonization by ESBL-E in women at delivery is associated with high all-cause morbidity and mortality in their newborn babies compared to ESBL-unexposed babies, and
4. periodic outbreaks of carbapenem-resistant Klebsiella pneumoniae sepsis in newborn units with case fatality rates as high as 45%.
We have preliminary molecular characterization of about 500 of the 2,750 diverse bloodstream bacterial isolates in our collection. In addition, we have access to a collection of over 2,500 clinical isolates of blood, sputum, and urine from our sentinel laboratories in Nigeria and Rwanda that are partially characterized.
Our overall strategy is focused on three thematic areas:
1. comparative phenotypic and genotypic studies of archived and contemporary clinical isolates to inform trends in AMR and dynamics of transmission,
2. incorporation of acute inflammatory markers of serious bacterial infection and gene products from resistant bacteria into a portable screening tool for clinical care, and
3. explore the potential benefit of an aminoglycoside (Tobramycin) conjugated to an antimicrobial peptide for enhanced bactericidal activity against multi -drug resistant enterobacteriales.

Developing data science solutions to mitigate the health impacts of climate change in Africa: the HE2AT Center

The world's climate is changing rapidly, with global temperatures having risen more than 1°C since the industrial revolution, and a further 0.5°C increase is likely by 2040. Heat waves and rising temperatures have major, though underappreciated, health implications, particularly for vulnerable populations in low-income settings. The overarching objective of the Heat and Health African Transdisciplinary Center (HE2AT Center) is to develop innovative solutions to mitigate the health impacts of climate change in Africa. The consortium of academic and non-academic partners is drawn from across sub-Saharan Africa and from the United States, and constitutes a transdisciplinary group, including heat physiologists, biomedical and climate content experts, public health practitioners, social-behavioral scientists as well as statisticians, and computer and data scientists. The Center will systematically develop a data ecosystem containing biomedical data, integrated with weather, air quality and other environmental data, and other geospatial data within two existing highly- complementary data platforms (IBM-PAIRS and the University of Cape Town). Over five years we will implement two Research Projects and 10-12 Pilot Projects, all streamlined and supported by the Administration, Data Management and Analysis, and Training and Engagement Cores. The first Project will implement an innovative data science approach to characterize the clinical outcomes of heat exposure in pregnant women and neonates. We will reuse data from cohorts and trials among pregnant women and neonates conducted across sub-Saharan Africa since the year 2000. Data from systematically identified studies will be integrated in an Individual Participant Data platform from data repositories and data owners. Then, analyses of relationships between heat exposure and outcomes (preterm birth, birth weight and pre- eclampsia) will inform quantification of heat-related disease burden. Finally, taking all findings together, we will pilot a district-level climate change indicator, the first of its kind. The second Project assesses the burden of heat-related morbidity in vulnerable urban settings using geospatial and heat hazard analyses in Abidjan, Cote d'Ivoire and Johannesburg, South Africa. This Project uses more complex data and data sources on the built environment and topography, for example, to assess heat-health impacts, and how these vary across urban geographies. Activities will inform development of an Early Warning System, including a digital App that delivers information to people on their forecasted risks of heat-health disease, based on their individualized risk profile, as determined by a machine learning algorithm which takes into account weather conditions, individual characteristics, geolocation and other factors that drive risk. These systems are a central element in heatwave responses, allow for adequate preparations for heat events, which is especially important for vulnerable groups and industry. We will collaborate closely with other Hubs and parts of the DS-I Africa consortium, supporting them to incorporate climate data within their research activities, and vice versa.
Public Health Relevance Statement
HE2AT Center Overall - Project Narrative The Heat and Health African Transdisciplinary Center (HE2AT Center) will generate new knowledge on the impacts of heat waves and extreme heat on clinical conditions and on people living in vulnerable urban settings, applying data sciences approaches to existing environmental and high-quality health outcomes data. Perhaps more importantly, the Center will develop and test innovative solutions for tracking heat-related conditions at a district level and for providing individualized early warnings of dangerous heat periods relevant to high-risk groups, industry and the general population. In all Center activities we will build capacity of the team and key organizations, engage with communities, government and other stakeholder, and work across the DS-I Africa Program.

Harnessing Data Science to Promote Equity in Injury and Surgery for Africa

Trauma and other surgically treated conditions are a crippling, unaddressed burden of disease that disproportionately impacts sub-Saharan Africa (SSA) globally. The Data Science Center for the Study of Surgery, Injury, and Equity in Africa (D-SINE-Africa) is a strategic partnership between the University of Buea (Buea), the University of California (Los Angeles (UCLA) and Berkeley), the Cameroonian Ministry of Public Health, the African Institute for Mathematical Sciences in Cameroon, and the University of Cape Town in South Africa. D-SINE Africa will address the intersection of health disparities with the risk factors and outcomes associated with injury and surgical disease in SSA. Although more abundant than ever, data is still a limited resource in low- and middle-income countries. Our approach views big data available in SSA as an opportunity to develop sustainable data-constrained approaches appropriate for resource-constrained settings. Thus, data science will be harnessed to address our Hub's two main goals which are;1) to decrease the burden of injuries and surgical diseases through improved surveillance, prevention, and treatment; and 2) to improve access to quality surgical care in Cameroon and other SSA countries. These goals will be achieved through three specific aims: 1) Research 2) Networking and 3) Capacity Building. These aims implemented through three cores (Administrative, Data Management and Analysis, and Capacity Building Cores) and two Research Projects that will be conducted in Cameroon, South Africa, and Uganda. Research Project 1 - Health Equity Surveillance addresses the urgent gap in rapid socioeconomic (SES) estimation necessary to track health equity in acute care settings by applying a clustering algorithm to existing publicly available Demographic and Health Surveys data sets for SSA. Research Project 2- Trauma Follow Up Prediction aims to improve trauma outcomes by using machine learning to optimize a mobile phone-based screening survey that will identify which trauma patients would benefit from further care after they are discharged from the hospital, again using a data reduction big data to small data approach in line with the Hub's commitment to sustainable data use practices. These projects utilize common data sources (Cameroon Trauma Registry), have harmonizing themes (injury, equity, and data reduction), and will yield findings that can be used together (e.g., identification of SES groups vulnerable to poor follow up care). The Hub has innovatively built community engagement vehicles into the Internal Advisory Board to promote a multi-faceted approach to building trust with research participants and end-users. D-SINE Africa efforts will drive innovation and impact in data science, injury, and equity research to improve access to surgical disease prevention and care, including those whose SES conspires to increase their vulnerability to injury and surgical conditions while reducing consistent surgical care access. D-SINE Africa's strategic partners joint infrastructure are available for use by its members and DSI-Africa consortium at large.

MADIVA (Multimorbidity in Africa: Digital innovation, visualisation and application)

MADIVA: Multimorbidity in Africa: Digital innovation, visualisation and application Research Hub Our research hub will develop data science to address the complex interplay, individual and public health challenges, and rising prevalence of multimorbidity in Africans. Epidemiological transition in Africa has greatly changed health risks at both individual and public health levels. Trauma, infectious & non- communicable diseases are major health challenges; multiple occurrences multiply burdens for individuals and communities. The inherent value from the increasingly rich and varied existing data has not been exploited. We have rich longitudinal epidemiological data sets, partial but increasingly available clinical and laboratory records, and genomic data. The data, as is common in the domain, is highly heterogeneous, and in parts, fragmented and incomplete. The Hub will be led by the University of the Witwatersrand, Johannesburg collaborating with the African Population and Health Research Center (Nairobi, Kenya), IBM Research Africa, the South African Population Research Infrastructure Network and Vanderbilt University Medical Center. The initial sites are an urban population in Nairobi, and a rural, transitioning, site in Bushbuckridge, South Africa: each has rich longitudinal data of >100,000 individuals. Although incomplete, clinical records are increasingly being captured digitally. Notably there is genomic data from a subset of individuals at each of the sites. Multimorbidity is inherently multi-factorial — fragmented unlinked data retards solutions. Project 1 focuses on linking. Integrated data sets will aid in understanding these communities, representative of many in Africa, and protocols developed will be applicable elsewhere. A researcher dashboard will allow researchers to find data and plan projects. A clinic dashboard will help clinic managers and health officials to monitor and plan effectively. Project 2 will be solution-oriented to develop novel data science methods to provide actionable insights. We will develop robust methods for risk profiling of individuals and communities using heterogeneous data. We pioneer the use of polygenic risk scores in Africa. Key data science questions are explicability of results and managing high-dimensional data without loss of power. An innovative aspect of our grant will be translational work with public precision health to provide insights into drug dispensing, pharmacogenomics, and behavioral or possible social interventions. The Capacity Development & Pilot Project Core will support a tiered set of pilot projects for African researchers at different career stages, leveraging our strong research platforms and networks. We aim to build capacity in key areas including incubation and cost modelling for data science. The Administrative and Leadership Core will provide the necessary leadership and project management and linking to other DS-I components. The Data Management and Analysis Core will develop protocols for the responsible sharing of data, data quality control, technical support as well as innovate new analysis techniques.

MUST Data Science Research Hub (MUDSReH)

Data science hold great promise for sub-Saharan Africa, yet research capacity and appropriate training are limited. Medical images are particularly exciting for data science in this setting given their ability to facilitate care delivery across healthcare cadres. Efforts to translate the knowledge gained through data science into improved clinical care can be fostered through implementation science. Strategic multi-disciplinary, multi-institutional partnerships will be key for the development and ultimate impact of data science for African-based solutions. Innovation. This proposal is innovative in 1) its use of data science with medical images for advancing clinical care, and 2) integration of data science with implementation science to promote clinical impact. Approach. The Mbarara University Data Science Research Hub (MUDSReH) in Uganda will use multiple technologies to improve the capture of medical images and employ key data science methods, such as machine learning and artificial intelligence, to expand their utilization in sub-Saharan Africa. Clinical impact will be strengthened through implementation science methods. Formal training will be available for both data science and implementation science, and ongoing expansion of efforts will be promoted through regional summits and collaboration. The two initial research projects—optimized posterior fundus imaging to diagnose posterior segment eye disease and mobile- phone based cervical images to detect cancer—will involve partnerships with the College of Ophthalmologists of Eastern, Central and Southern Africa in Kenya and the Kwame Nkrumah University of Science and Technology in Ghana, respectively. Massachusetts Institute of Technology will provide technical support for data management and analysis, and Massachusetts General Hospital will support the use of implementation science and other relevant content areas for individual research projects. Each project will work with local Ministries of Health, community-based organizations, and/or technology partners to advance the mission of harnessing medical images for improved clinical outcomes and impact. Our specific aims are as follows: 1. To integrate all partnering academic institutions, technology companies, and non-governmental organizations to advance data science for medical imaging. All partners will contribute to a common infrastructure for data processing and analysis. We will establish both the administrative and technical mechanisms for data exchange, accounting for necessary data security and quality assurance and control. We will further create pathways for facile expansion to include additional research projects involving medical imaging (e.g., in radiology and dermatology) at multiple African institutions. 2. To integrate implementation science with data science to advance clinical impact. All research projects in the MUDSReH will include implementation science methodologies to ensure the developed technology reaches the end user in practical ways that meet local needs and priorities.

Role of Data Streams In Informing Infection Dynamics in Africa- INFORM Africa

A key problem in Africa is the paucity of population-scale epidemiologic data sources and analytical capacity to rapidly identify and understand infectious disease pandemics. Yet, in the context of fragile health systems and limited resources, the need for population-scale epidemiologic data is even more urgent to provide information on transmission dynamics and inform interventions. The overall goal of the ‘Role of Data streams In Informing infection dynamics in Africa (INFORM Africa) Hub’ is to effectively use big data to address pressing public health needs of SARS CoV-2 and HIV pandemics with the overall goal of developing population-scale data streams as a cornerstone of future pandemic preparedness. INFORM Africa proposes to use existing data from Nigeria and South Africa, the two most impacted countries in Africa accounting for 41% of the continent’s SARS-CoV-2 infection and about 40% of its HIV burden. The Hub is led by two well-established and successful non-governmental organizations: Institute of Human Virology Nigeria (IHVN) and the and the Centre for the AIDS Programme Of Research In South Africa (CAPRISA) with strong links with Universities and their respective government agencies, also engaged. INFORM Africa also partners with a private partner from the industry – Akros Zambia. INFORM Africa has assembled experienced researchers with complimentary expertise in big data analytics, quantum information processing, spatial statistics and analysis, genetics, computational biology, agent-based and data driven modelling, clinical infectious diseases, infectious disease epidemiology, molecular virology, and geospatial analytics to address the goal of this submission through four Specific Aims. AIM 1 establishes data streams from public and private sectors in order to understand the multilayer interactions that may explain the dynamics and impact of COVID- 19 pandemic, through three proposed Research Projects and two proposed Cores, supplemented by the pilot projects. AIM 2 develops geospatial tools for use by country leadership and governments in pandemic surveillance and response to improve preparedness. AIM 3 expands data science research opportunities and capacity through the engagement with the broader DS-I Africa consortium and through several proposed pilot projects in data science. AIM 4 maintains a sustained engagement with the policy makers and governments in order to promote further open access to high quality data and the redistribution and uptake of any product/tool developed by the INFORM Africa. The DMAC & NGS Core is the lynchpin of INFORM Africa, assembling and managing the Research Hub’s data and providing seamless access to a set of tools and workflows that link the Hub to the broader DS-I Africa Open Science Data Platform and coordinating center. The Administrative Core harmonizes and streamlines administrative, financial, and communication processes for INFORM Africa, and coordinates the selection of pilot projects consistent with INFORM Africa’s aims, and in compliance with DS-I Africa requirements. Project and Core leads make up the MPD/PI leadership and the Steering Committee of the INFORM Africa supported by a Scientific Advisory Board.

UZIMA-DS: UtiliZing health Information for Meaningful impact in East Africa through Data Science

Overall Component Africa is the youngest continent in the world, with 60% of its population under the age of 25. The span between early life to young adulthood represents a critical window where biological, environment and psychosocial events can significantly impact long- term uzima, which means health/well-being in Swahili. Coupled with the recent technological advances and the enormous volumes of data collected in Africa, there is an unprecedented opportunity to leverage data science to identify and improve the health trajectories of young Africans. However, significant analytical and computational barriers persist that impede our ability to use this information to change care at the community and individual level. Our proposed Research Hub, UZIMA-DS, aims to change this narrative by UtiliZing health Information for Meaningful impact in East Africa through Data Science. We will create a scalable and sustainable platform to apply novel approaches to data assimilation and advanced artificial intelligence (AI)/machine learning (ML)-based methods to serve as early warning systems to address critical health issues impacting young Africans in two domains: maternal, newborn and child health and mental health. Our Hub addresses three critical needs across the translational spectrum of data science: 1) Harmonization of multimodal data sources for meaningful use and analyses; 2) Leveraging temporal patterns of data to identify trajectories through prediction modeling using AI/ML-based methods; and 3) Engaging with key stakeholders to identify pathways for dissemination and sustainability of these models into target communities. For our Maternal and Child Health Study (Project 1), we will leverage the large and diverse existing data sets in Kenya, including two demographic surveillance systems, cohort studies and hospital data, to develop and validate AI/ML-based prediction models to identify women of childbearing age at high risk for poor pregnancy outcomes (e.g., pregnancy-induced hypertension, low birthweight) and non-communicable diseases later in life and children at risk of future poor life outcomes (e.g., developmental delays). For our Mental Health Study (Project 2), leverage existing surveillance data as well as novel mobile technologies (e.g., mobile apps, wearables) for the development of existing and new AI/ML-based prediction models to identify adolescents and young healthcare workers at risk of depression and suicide ideation in Kenya. Our Hub and Projects will be supported by an Admin Core, Data Management and Analysis Core, and a Dissemination and Sustainability Core, which will facilitate engagement with multisectoral stakeholders to identify sustainable model dissemination pathways into target communities. Ultimately, our work will empower African researchers to carry forward the UZIMA-DS Hub to address on-going and evolving health needs of Africans by building sustainable infrastructure, expertise, and partnerships for long-lasting impact. The UZIMA-DS Hub can serve as a model that can be scaled to other countries and health domains with the greater DS-I consortium to transform care delivery in Africa, ensuring that current and future generations of Africans can achieve uzima.

Research Project

Advancing discovery for developmental disorders - expanded analysis of the DDD-Africa resource

Developmental disorders (DD) represent a spectrum of often severe disabilities that are present from birth or early childhood. The Deciphering Developmental Disorders in Africa study (DDD-Africa) was started to describe and define the causes of DD in Africa, and this resource holds the potential for much more discovery, using data science- and omics techniques. To advance our knowledge, we propose to use the clinical information, genetic data, and DNA resources from the DDD-Africa study to increase diagnostic yield and make new discoveries about the causes of DD in African patients. These findings will lead to population-relevant and scalable solutions for Africa, fast-tracking genomics efforts for global health.

Artificial Intelligence assisted echocardiography to facilitate optimal image extraction for congenital heart defects diagnosis in Sub-Saharan Africa

Sub-Saharan Africa (SSA) accounts for over 50% of all global under-5 deaths. Congenital anomalies (CAs), notably congenital heart defects (CHD) which constitutes about a third of all (CAs), are a major contributor to this high under-5 morbidity and mortality in SSA. Late and missed diagnosis, owing to the lack of experts who can perform an echocardiography scan, remains the primary challenge to CHD diagnosis and care in SSA. Recently, there have been increased uptake CHD screening in newborns by pulse oximetry. However, the test is nonspecific and still requires expert confirmation through echocardiography. The few expert paediatric cardiologist centres that exist are often located hundreds of kilometers away from the birthing centres, placing enormous financial and physical burden on parents who must undertake this journey to confirm their baby's diagnosis, and not leaving out the particularly fragile and vulnerable neonate who may end up dying in the course of the journey. Training programs have been demonstrated to improving image capture and recognition of the anomaly. However, such programs are labor and time intensive and need to be repeated with staff turnover. A complementary strategy is therefore needed to improve and sustain the gains from training. In line with the DSI-Africa's mission to address critical health gaps through the application of data science, our proposed project seeks to leverage modern advances in data science and artificial intelligence (AI) to address the problem of CHD diagnosis in SSA by creating the possibility for low skilled sonographers to conduct an echocardiography scan for neonates (0-28 days) and extract optimal images that can be subsequently transmitted to a remote expert for interpretation. This means local non-experts (e.g GPs, nurses, midwives) serving the birthing centres will now be able conduct postnatal echocardiography scans for neonates suspected of having a CHD after pulse oximetry screening, allowing them to obtain optimal labelled images/video clips that can be transmitted to a remote expert for diagnosis confirmation. This will remove the burden and risk of travelling hundreds of kilometers, increase early diagnosis and initiation of care remotely, and reduce the workload on the few available experts. Future steps will include extending to prenatal diagnosis and predicting actual diagnosis.

Automated Mobile Microscopy for Malaria Diagnosis and surveillance in Uganda

Malaria is one of the leading health problems of the developing world. Malaria endemicity has been attributed to poor diagnosis at the lab level. This quite often leads to disease misdiagnosis and drug resistance. Many developing countries are faced with a lack of critical mass of lab technicians to diagnose the disease through a gold standard mechanism of microscopy and this has worsened the already dire situation in some of these Countries. World over the trending technologies are now based on machine learning and deep learning techniques. These can be leveraged with the combination of smartphones to improve disease diagnosis. However, most of the previous work on automation for microscopy diagnosis has been carried out adhocly in the lab environment and no study seems to give a practical field deployable solution. The goal of the proposed research is to develop a rapid, low cost, accurate and simple in-field screening system for microscopy challenges like malaria. Specifically, this study will test and validate developed image analysis models for real time field-based diagnostics and surveillance of malaria. The proposed solution builds from our earlier work on mobile microscopy carried out at Makerere University AI Lab, that has confronted automated microscopy through exploiting recent technological advances in 3D printing to enable development of a low-cost 3D printed adapter. This has enabled attachment of a wide range of Smartphones on a microscope, furthermore, we have implemented deep learning models for pathogen detection to produce effective hardware and software respectively.

BCX-Africa: Utilizing data science to evaluate the applicability of blood cell traits polygenic risk scores for disease prediction in Africa

The goal of the proposed work is to aggregate, manage, store and analyse largest genetic data of blood cell traits in continental Africa for novel gene discovery and genetic risk prediction using DSI-Africa funded Open Data Science Platform (ODSP) which allows for effective collaborative research in a resource limited setting. Circulating blood cell traits are critical intermediate clinical phenotypes for a range of disease outcomes including cardiovascular, oncologic, hematologic, immunologic and infectious disease. Genetic factors play an important role in determining these traits. However, hundreds of genetic loci have been identified using conventional genome-wide association study (GWAS) approach in European and East Asian-ancestry populations. In view of the genetic diversity of Africans from other ancestries, and their low representation of 1.1% in global GWAS, more studies are required to unravel population specific variants that have been noted to exist in blood cell traits. Given the central importance of Africa to human origins and disease susceptibility, there is a clear scientific and public health need to develop large-scale efforts examining blood cell traits susceptibility in populations of African descent, which might yield insights that will benefit other ethnicities regarding disease etiology and potential therapeutic strategies. Specifically, we will aggregate blood cell traits genome-wide association study (GWAS) data across continental Africa leveraging existing partnership in Africa including H3Africa. We will test for association between genetic variants and each blood cell traits in a meta-analysis GWAS and refine genetic association signals at new and existing blood cell traits association loci. We will develop and assess transferability of Polygenic Risk Score for blood cell traits risk in Africa including evaluating the predictivity of the blood cell traits PRS for hematologic and cardiometabolic traits. Our proposal is directly aimed at a key mission of RFA-RM-22-023, to support harnessing data science for health discovery, that will not only generate important findings relevant to human health, but also serve as a vehicle to improve the genomic data science research capacity in Africa. Our proposal also directly addresses NIH strategic plan for data science. The unique collaboration within the Human Heredity and Health in Africa (H3Africa) for a cross-group analyse in Africa data strengthen the capacity of scientific research through scientific training in all participating institution, which facilitate opportunity for shared expertise and resources

CHaracterizing Effects of Air Quality In Maternal, Newborn and Child Health: The CHEAQI-MNCH Research Project

The overarching goal of this CHEAQI-MNCH Research Project is to test an innovative data science approach involving quantification of the current and future impacts of air pollution on maternal and neonatal health in sub-Saharan Africa.

DSpace: Utilizing Data Science to Predict and Improve Health Outcomes in Pediatric HIV

The project combines longitudinal electronic medical records with 'omics data to predict the development of metabolic syndrome in HIV infected children using artificial intelligence/ machine learning techniques. In addition, the project also seeks to develop more sensitive TB diagnostics biomarkers utilizing electronic health records and multi-omics data.

Genome-wide characterization of complex variants and their phenotypic effects in African populations

Advances in omics technology have the power to provide integrative models of disease risk and influence health outcomes. However, the utility of these models has so far been limited to non-African populations, due to biases in available datasets. Further, efforts to identify medically relevant genetic variants have included only a subset of known genetic variants and have had limited focus on phenotypes most relevant to Africa. Newly available genomic datasets from the African continent provide a rich opportunity to begin addressing this gap.

Geo-enabled detect and respond system for antimalarial resistance in Ghana: GDRS - Ghana

INnovative data Science to Impact the TB Epidemic (INSITE)

New drugs and shortened combination regimens have been introduced for the prevention and treatment drug-susceptible and drug-resistant tuberculosis (TB) but their impact on pregnant people and their infants, including TB transmission and pediatric TB, is uncertain. This partnership between the Centre for Infectious Disease Epidemiology & Research, University of Cape Town and the Western Cape Health Intelligence Directorate will consolidate existing relationships across clinical, epidemiological and data science disciplines to establish a scalable data infrastructure for the management of maternal TB at the regional and individual clinical levels. Data science techniques will be employed to develop and enhance a novel and robust public service population-level data harmonization platform to i) create actionable clinical tools to optimize person-level interventions and monitor programs; ii) generate large, linked cohorts to address epidemiological questions and assess the impact of policy interventions and clinical tools at the population and individual levels.

Integrated modeLs for Early Risk-prediction in Africa (ILERA) study

The "Integrated modeLs for Early Risk-prediction in Africa (ILERA) study (Ilera in Yoruba means health) aims to investigate the potential for improving the prediction of 13 cardio-metabolic disease (CMD) indicator levels (and thereby of CMDs) by integrating diverse streams of data into the risk prediction models. Starting with the currently best representative/performing PRSs, we plan to progressively add layers of data such as predicted transcriptomes, environment and lifestyle information and assess whether this additional data, either independently or in combination with others could improve prediction. To allow for complex and non-linear interactions between these factors, data-driven approaches will be employed for integrating these variables with the genomic data. In-depth evaluation of the predictivity of these models will be performed in independent cohorts from South, East and West Africa and also in longitudinal data from the same cohort. The potential for an early warning system aimed at public health intervention will be investigated using the combination of best predictive models and traits

Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa

Colorectal cancer (CRC) is the third most commonly diagnosed cancer and the second leading cause of cancer related deaths worldwide. Rates in Africa are on the rise, but essential histopathology services critical for cancer care are scarce. To address this barrier, we developed an artificial intelligence (AI)/machine learning (ML)-based computational pipeline (SIVQ/VIPR) that performs automated pixel-level image segmentation and classification from digital images of routinely collected hematoxylin and eosin (H&E)-stained slides. SIVQ/VIPR is highly precise, reproducible, and outperforms subject matter experts. Once histologically distinct regions are identified, image analysis algorithms can then identify individual regions and aggregate them to predict diagnostic and prognostic features in conjunction with clinical outcomes to guide treatment. Our overall approach is to leverage our validated SIVQ/VIPR computational pipeline to develop and validate an AI-based diagnostic decision support (AI-DDS) tool for CRC diagnosis and prognosis in an existing Kenyan cohort. To carry out this work, the Aga Khan University (AKU)- East Africa and the University of Michigan have partnered with Tenwek Hospital, a non-academic community-based public hospital in rural Bomet, Kenya, to develop a unique collaboration of oncologists, pathologists, surgeons, statisticians, and informaticians, making us uniquely suited to develop population-relevant, affordable, and scalable data science solutions in Kenya – all priorities of the DS-I Africa Program. We will: Aim 1. Adapt and validate an existing ML-based diagnostic algorithm for CRC using digital fields of view from H&E-stained slides in a retrospective cohort of n=675 CRC cases from the AKU and Tenwek Hospitals. We will apply the CRC-trained SIVQ/VIPR computational pipeline for segmentation and classification for CRC features, followed by a confirmatory classifier step to achieve a case level, binary result of a cancer/no-cancer (i.e., diagnosis). Aim 2. Develop and refine an unsupervised ML method to identify histopathology image-derived measurements associated with CRC prognosis. We will use computer/machine vision approaches to identify image features (e.g., cellular morphology) discriminative of CRC prognosis and biological potential for disease aggressiveness. Combined use of AI-based morphological signatures of aggressive disease (e.g., high-grade tumor architecture) will be compiled with other clinically relevant features towards the goal of generating a multi-axial multiplexed AI-DDS tool that can maximally inform of the biological and metastatic potential of each CRC case. This project will lay the groundwork for an AI-DDS tool for clinicians (e.g., pathologists, oncologists) that facilitates prompt and accurate diagnosis, prognosis, and risk stratification for CRC care in Africa. Because this approach leverages open-source software and can be deployed as a turn-key system intended for web-based cloud deployment, it is well-suited for capacity building, integrating into educational programs, and expanding to other emergent or prevalent cancers (i.e., breast, cervical, prostate) as part of the DS-I Africa Consortium.

Leveraging Data Science Applications to Improve Children's Environmental Health in Sub-Saharan Africa (DICE Project)

The proposed research harnesses data science applications to establish the spatial variability in the impact of ambient PM2.5 exposure on children's health in Sub-Saharan Africa (SSA) and further identify the explanatory and moderating factors. The proposed research is relevant as it focuses on leveraging data science tools to address a very pressing public health problem in Africa in line with the overarching goals of the DSI-Africa program. The proposed research will enhance public health data science capacity on the continent and with the findings triggering investment in air pollution control as well as policy action for addressing area and household poverty to help improve child health and survival in SSA.

Tuberculosis in households with infectious cases in Kampala city: Harnessing health data science for new insights on TB transmission and treatment response (DS-IAFRICA-TB)

Tuberculosis (TB) is prevalent in Uganda, and overlaps with an already high burden of HIV/TB coinfection. While almost all hospital-based TB cases in Kampala city, the capital of Uganda, have clear TB symptoms, 30% or more of the people with undiagnosed TB, identified through active case finding, are asymptomatic for TB; moreover, the host risk factors for TB in Kampala cannot be distinguished from risk factors associated with the environment. Complicating this further is the fact that anti-TB treatment failure rates are higher in Uganda by several order of magnitude, compared to global estimates (17% vs. 10%). These TB-specific challenges depict only a fraction of the complexity underlying the disease, especially in endemic settings with a high burden of HIV/AIDS. Data science methods, especially Artificial Intelligence (AI) and/or Machine Learning algorithms, can unravel such complexity and untangle factors of the host, pathogen and environment underlying TB, which hitherto, have been difficult to explain or predict with conventional approaches. In this proposal, we will harness health data science and elucidate factors underlying transmission of TB in a household, as well as anti-TB treatment failure. We will leverage the computational infrastructure at Makerere, and available demographic, clinical and laboratory data sets from TB patients and their contacts, and develop AI/Machine Learning algorithms that identify: (1) Patients at baseline (month 0) who would not sputum and/or culture convert at months 2 and 5, hence are at risk of failing TB treatment, (2) Contacts of index-TB cases who are at risk of developing household TB disease, as well as contacts who could be resistant to TB infection despite persistent and/or multiple exposure to M. tuberculosis in a household. Answering these aims provides the required evidence that data science methods are effective at early identification of potential TB cases and high-cost patients, hence contribute to halting of TB transmission in the community.


Advancing Public Health Research in Eastern Africa through Data Science Training (APHREA-DST)

Given the unprecedented abundance of increasingly complex and voluminous data across many domains of health, data scientists could play a transformative role in exploiting the big data revolution to address the multi- pronged health challenges in sub-Saharan Africa. However, there is a severe lack of well-trained data scientists and home-grown educational programs to enable context-specific training. We propose to advance public health research in Eastern Africa by establishing new multi-tiered training programs in health data science, with initial focus on Ethiopia and Kenya due to well-established partnerships and demonstrated needs. A partnership between Columbia University (CU, USA), Addis Ababa University (AAU, Ethiopia) and University of Nairobi (UofN, Kenya) will leverage world class strengths in data science at CU to enhance the overall capacity in Ethiopia and Kenya by building upon the readiness and national prominence of AAU and UofN. Using in-person and distance modes of training, we will (i) develop new context-specific MS programs in public health data science, designed to be sustainable well beyond the funding period; (ii) undertake a faculty mentoring program to build and strengthen capacity in health data science for promising Eastern African scientists; and (ii) conduct a short-term training program structured around targeted short courses and workshops for a wide spectrum of trainees. The faculty mentoring mechanism will initially start with partnerships between CU and East African faculty, and will progress into groupings across the three institutions. The skills developed through this program will in turn strengthen the overall training and research capacity in data science. To broaden the reach into the scientific and policy community, the short-term training will engage trainees from partnering governmental and non-governmental stakeholders and the private sector. The program will leverage several ongoing research projects led by team members or affiliated partners on environmental health, exposure assessment, remote satellite data, occupational exposures, climate change, infectious diseases, health surveillance, and health system monitoring and evaluation, which will be used as immersion opportunities to enable hands-on experience with new data science techniques for trainees. Evaluation and monitoring will track the success of the training programs and of the trainees achievement of their development goals, successful completion of the research training, scientific presentations and publications, and the sustainability and growth of the MS degree programs. In Year 5, we will broaden the training program to the wider East Africa region through sharing of curricula and inviting trainees for engagement. We will also explore the feasibility of incorporating the courses we have developed into existing PhD curricula or creating new PhD programs in public health data science. Beyond the educational programs and collaborations, our project is designed to cultivate long-term regional collaboration, lifelong learning skills, and a supportive community of researchers committed to open science, algorithmic fairness, and data science for good, ultimately leading to better public health practice.

Computational Omics and Biomedical Informatics Program (COBIP)

Despite the significant human health and disease burden in Africa, no biomedical data science graduate degree programmes in computational omics, clinical informatics and translational research are offered on the continent. To foster research training that will cultivate graduates able to respond with agility to future biomedical data science needs and develop innovative solutions to address African health challenges, formal interdisciplinary training in biomedical data science is needed. Such training opportunities should include: (i) Biomedical data science applied to data from multi-omics and other technologies, such as biomedical imaging, coupled with the ethical, legal and social Implications of these advances; (ii) Fundamental and advanced concepts in machine intelligence and computational paradigms for developing novel approaches for mining large-scale biomedical data; and (iii) Awareness, amongst graduates, of career opportunities within biomedical data science along with how the soft and hard skills gained in the training program could be transferred into a range of biotechnology/biomedical industries and research/professional careers. Motivated by these needs and leveraging the expertise in clinical and translational research as well as biomedical informatics at Oregon Health & Science University (OHSU), we propose to develop the “Computational Omics and Biomedical Informatics Program” (COBIP) at the University of Cape Town, South Africa. The program will introduce graduate degree programs to train African biomedical data scientists and faculty in rigorous fundamental data science, computational omics, clinical informatics and imaging data science. COBIP will lead to the development of solutions that address the African disease burden and are relevant to global health. Specifically, we aim to: 1) Develop an interdisciplinary data science training program focused on the health and healthcare needs and priorities of Africa; 2) Train faculty with relevant disciplinary backgrounds, from collaborating African institutions, in biomedical data science to support the development of the field across the continent; and 3) Establish COBIP as an international center of excellence in computational omics and biomedical informatics, distributed across African institutions as a collaborative network of faculty, researchers, and students focused on the African health priorities. COBIP will attract cohorts of trainees from diverse backgrounds including mathematics, statistics, informatics, computer sciences, engineering and biomedical sciences. COBIP will provide innovative educational infrastructure and research opportunities as well as links between clinicians, researchers and biomedical industries through placements and internships. COBIP, through its graduates, will have decisive impacts on African biomedical data science research and stimulate diagnostics, therapeutic selection and drug development to support improved human health and healthcare in Africa and globally.

Data Science for Child Health Now in Ghana (DS-CHANGE)

Data Science for Child Health Now in Ghana (DS-CHANGE) Training Program Our goal for the DS-CHANGE Training Program is to build data science capacity at Kwame Nkrumah University of Science and Technology (KNUST) and develop a cadre of qualified data scientists focused on child health in Ghana. We will develop this cadre of scientists with mentored training and experiential learning that cross-sect biomedical data science disciplines (applied mathematics, computer science, clinical informatics, biostatistics, epidemiology), health conditions (malaria, injury and congenital anomalies), and biomedical domains (e.g. pediatrics, parasitology). Faculty and graduate student trainees will tackle computationally complex child health problems in Ghana. Our program focuses on 3 of the top 10 causes of child death/disability in Ghana including malaria, injury, and congenital malformations (orofacial clefts). We aim to: Aim 1: Deliver a comprehensive mentored interdisciplinary training program that cross-sects data science methodologies, health conditions, and biomedical domains to a diverse group of Ghanaian graduate-level MS and PhD trainees. Aim 2: Increase KNUST faculty and institutional capacity in biomedical data science by (a) facilitating cross-training in data science methodologies; (b) developing deeper expertise in biomedical data science methods; and (c) building teaching and mentoring proficiency in biomedical data science. Aim 3: Develop proficiency of faculty and graduate trainees in effective methods of team science so that interdisciplinary teams with minimal overlapping expertise can function synergistically. Collaboration: This program builds on established collaborations between KNUST and the University of Washington (UW), Seattle Children's Hospital and Research Institute, and the non-profit Smile Train. Approach: KNUST graduate trainees will obtain a Master or PhD degree in data science from KNUST that will be bolstered by a 3 month externship in Seattle with a thesis on a real world Ghana child health problem. KNUST faculty trainees will participate in faculty exchanges with UW faculty. Select KNUST faculty will complete a UW Professional Certificate Program in a data science domain. All trainees will participate in a monthly Zoom seminar to enhance interdisciplinary effectiveness. Impact: We will train 13 graduate trainees and 16 faculty for a robust biomedical data science graduate program at KNUST. Trainees will compete successfully for research funding, will contribute to the evidence base in child health, and will take up positions as leaders in data science and child health at KNUST and other Ghanaian institutions.

Makerere University Data Science Research Training to Strengthen Evidence-Based Health Innovation, Intervention and Policy (MakDARTA)

Harnessing Data Science for Health Discovery and Innovation in Africa (DS- I Africa) Research Training Program Fogarty International Research training award. The proposed grant’s goal is to leverage on the existing partnership between Makerere University and Johns Hopkins University to build a joint comprehensive Data Science Research Training program to Strengthen Evidence-Based Health Innovation, Intervention and Policy in Uganda. Cognizant of the fact that, the use of data science analytics in medicine is already generating new knowledge that is advancing prediction, performance and discovery in health, the training program aims to; (1) Establish graduate degree training (Master’s and PhD) in data science for graduating health-professionals at MakCHS, (2) Provide Postdoctoral training that will enable Ugandan H3Africa alumni to transition into faculty and/or independent data science research leaders and, (3) Train MakCHS faculty to enhance their capabilities for data science research and mentorship of data science trainees. The program will comprise of higher degree (Master’s and PhD) and non-degree (Faculty and Post- doctoral) training with tailored coursework and mentored research training to equip trainees with the requisite skillset for successful careers in data science. For degree training, our strategy will be to take advantage of the existing infrastructure and trainings and gradually introduce data science training by selecting trainees from existing data science relevant Master’s and Doctoral programs; at the same time, we will expand the proposed Master’s and Doctoral data science curriculum in accordance with Makerere University guidelines, foster collaborations with other DS-I Africa Training Program PIs with focus on curriculum harmonization and mapping of the same on core competencies, and seek national, regional and international accreditation/endorsement. With significant investment in key infrastructure and training at Makerere over the last 10 years, we believe that sustainable research and training in data science for health is feasible and could be transformative for Uganda’s health research and care. At the end of the grant period, measurable parameters of increased research capacity we expect as a result of the training activities shall include (among others); significant new knowledge generated (as judged from the number of data science for health scientific publications, policy briefs, etc.), DS-I health innovations generated by trainees, interdisciplinary DS-I research teams at Makerere.

NYU-Moi Data Science for Social Determinants Training Program

The overarching goal of the Data Science for Social Determinants (DSSD) at New York University, Moi University and Brown University is to develop future leaders in data science who are equipped to develop and analyze data to better leverage deep and rich survey as well as internet and other digitized data sources that can help us capture information on the social determinants of health. DSSD's design will rapidly expand the local base of expertise via curriculum development, resulting in 2 PhD (4-year training) and 6 total postdoctoral (2-year) and faculty trainees (12-14 month) will train at NYU, while 8 Masters and 2 PhD trainees will commence or complete training (2-year and 4-year training, respectively) through newly developed data science tracks at Moi University. Connecting with local data science industries and organizations (IBM, Deep Learning Indaba, DataKind, AI.Kenya and Aga Khan University Nairobi and Karachi) will bring in and create intellectual meeting spaces for a wide pool and variety of talented trainees from both data science and health backgrounds, to propel and sustainably advance data science capacity in Kenyan institutions as well as the DS-I Africa consortium.

Research Training in Data Science for Health in Rwanda

Research Training Program (RTP) grant application for DS-I Africa initiative (RFA-RM-20-016) is a partnership of 3 collaborating institutions: Washington University in St. Louis (WUSTL), University of Rwanda (UR) and African Institute for Mathematical Sciences (AIMS), both in Kigali, Rwanda. The overarching objective of U2R RTP is to stimulate and catalyze innovation of Data Science (DS) for Health in Rwanda. A high prevalence of communicable diseases, coupled with a rapidly expanding epidemic of non-communicable diseases (NCDs), forecasts a perfect public health storm in Rwanda, providing impetus and rationale to leverage DS to address health care gaps. A structured program design will help develop trainee research careers in DS with particular focus on health care topics relevant to Rwanda, including communicable (i.e., HIV, malaria, COVID-19, etc) and chronic NCDs (i.e., hypertension, heart disease, diabetes, etc). We propose to expand existing model curriculum for innovative and interdisciplinary DS for Health research training and practice by identifying additional competencies in three major scientific areas: 1) computer science & informatics; 2) statistics & mathematics; and 3) biomedical sciences & public health. U2R RTP will: a) provide support (tuition, stipend, research funds) for 24 trainees: MS (n=14), PhD (n=6); post-doc (n=2); junior faculty (n=2); each trainee receives training for 1-2 years based on individual needs; b) foster an innovative team science transdisciplinary approach to research; and c) build institutional capacity at UR and AIMS to support the long- term sustainability of the program. The intended outcomes are to: 1) train next generation of data scientists that will have the necessary expertise to solve challenging problems presented by the burden of communicable and chronic NCDs in Rwanda; and 2) complete a full U2R program leadership and faculty and institutional transition from US-Rwanda co-led to fully Rwandan-led by the end of the U2R program. Trainees will participate in mandatory week long “boot camps” (twice/yr), monthly webinars and mid-yr mtg with didactic training focused on career advice, research and academic advancement in highly-relevant Data Science and health care topics (i.e., data management, bioinformatics, statistics, epidemiology, communicable diseases, NCDs, environmental exposure, dissemination and implementation, etc); intense Grant & Manuscript Writing Skills Training Program geared towards development of competitive grant proposals (“Small Research Projects, SRP)” that trainees prepare, submit and conduct under direction of a multidisciplinary mentoring team from academic and governmental institutions. Effective mentoring during boot camp occurs through trainee “Group Brainstorming Sessions” where they present their developing SRPs to obtain feedback. An extensive program evaluation by trainees and faculty will also provide feedback for improvement. The U2R RTP will establish and cultivate a sustainable program for training, research, mentoring and career development that will position UR and AIMS as national and regional African leaders and a global hub for Data Science for Health in Rwanda.

Research Training on Harnessing Data Science for Global Health Priorities in Africa

The training program, “Research Training on Harnessing Data Science for Global Health Priorities in Africa” will build upon existing data science research capacity at the partnering institutions to enhance innovative new data science research capacity. Harvard University will lead the training collaboration with Heidelberg University in support of the University of KwaZulu-Natal (UKZN) as a hub with four spoke partners in sub-Saharan African countries, namely Ghana, Nigeria, Tanzania, and Uganda. This Program brings together a multidisciplinary team of researchers and expert faculty with in-depth experience in data science methods and two selected domain fields identified as global health priorities: (i) health systems strengthening, and (ii) food systems, climate change and planetary health. The program will advance training in data science and leverage methods training to address policy-relevant questions in these two research domains. The research training program will be structured to provide an appropriate balance of short, medium, and long-term training opportunities for participants from South Africa and 4 countries in Africa in the ARISE Network, namely Ghana, Nigeria, Tanzania, and Uganda. The long-term goal of this training program is to develop and instill advanced data skilled researchers in health data science in Africa through a rigorous curriculum and set of training and research activities designed to address the health needs of African countries. To support the long-term goal, the program has the following 3 specific aims: Aim 1: Train mid-level and senior researchers at UKZN to work as educators and principal investigators leading independent research programs focused on using data science to address important global health challenges of national, regional, and global relevance; Aim 2: Build a critical mass of junior public health and medical professionals across South Africa and four other countries in Africa in the ARISE Network (Ghana, Nigeria Tanzania and Uganda) who can design and successfully carry out rigorous health research projects using data science; Aim 3: Develop and institute a sustainable master’s degree program in health data science at the UKZN to serve the research training needs in South Africa and the African region. The program will train and support 17 master’s degree students, and 5 postdoctoral students through 2-year fellowships, using a data science curriculum and training program that is (i) competency-based, (ii) mastery-oriented, (iii) application-focused, and (iv) digitally-enhanced. The curriculum will be complemented by intensive, individual mentoring provided by a supervisory team, as well as by opportunities for practical training in internships, capstone projects, and academic research projects. The program will provide short course instructions to 350 professionals and researchers, of whom 50 will complete a series of four courses and obtain a proposed graduate certificate in health data science innovation. Through monthly seminars and annual symposia that will be attended by hundreds of program participants, the program will build a robust community focused on data science and its applications to global health. The program will integrate the programmatic work with the DS-I Africa Open Data Science Platform and Coordinating Center in the sharing and leveraging of data resources, data science technologies and resources to impact health outcomes across Africa.