Fourth Meeting of the DS-I Africa Consortium
16 - 22 November 2024
The Ravenala Attitude Hotel, Mauritius
Primary tabs
Data diversity & human genomics: co-creating an approach tailored to African priorities
This paper summarises the learnings from a workshop at the DSI Africa 2024 consortium meeting, run by Wellcome and attended by 70-80 participants. Huge thanks to participants for all their input.
1. Approaches to generating more diverse data
A role for funders to incentivise collaboration
Collaboration and knowledge exchange are essential for advancing sequencing efforts. This includes leveraging funders' networks to build connections across diverse groups (including pathogen genomics researchers).
Greater transparency and shared insights can help identify critical gaps and foster integration of different data sources, with funders using their convening power to facilitate access. Utilising existing data platforms can further enhance accessibility and collaboration. Drawing lessons from established initiatives like the UK Biobank or All of Us can provide valuable insights, but tailoring these approaches to align with African priorities will ensure relevance and impact. Partnerships among public, biotech and academic organisations hold
immense potential to drive progress across the continent, and Wellcome’s focus on data diversity presents a powerful opportunity to harness enthusiasm and promote collaboration in this space.
Gaps in diversity in current efforts and ways forward for prioritisation
Current genomic data from Africa is heavily skewed toward West and South Africa, with regions like East Africa, the Congo Basin and the Great Lakes significantly underrepresented. Efforts should be directed towards obtaining sequencing from these underrepresented regions. Additionally existing national genome projects, must ensure they capture the full diversity within countries, addressing geographic, ethnolinguistic, and population-level gaps. Currently they are mostly focusing on populations living in urban areas. Also, it is worth noting that current genome initiatives often rely on low-coverage whole genomes or genotyping arrays, which overlook rare variants and structural variations, hence missing out on relevant insights.
Existing biobanks and cohorts could be leveraged, prioritising samples that align with identified gaps, while also undertaking targeted new recruitment to represent populations missing from current efforts and enable deeper phenotypic data collection. Special attention should be given to rural populations and indigenous peoples (who are mostly nomadic and semi-nomadic), as current studies predominantly focus on urban centres. Cataloguing existing biobanked samples can highlight gaps in regional, linguistic, and ethnic representation, guiding efforts to fill these voids through additional sampling and sequencing. Establishing an Africa-focused biobank could provide a centralised resource to support these efforts. If setting up new cohorts, it is essential to enable recontact and ensure robust representation of diverse populations.
For all sequencing efforts (through genome projects, cohorts or biobanks) there is need to factor in regional disease priorities with particular focus on non-communicable diseases (as the focus so far has been infectious diseases). Linkage to epidemiological data and other health related datasets, including longitudinal, lifestyle, and clinical data, is crucial for comprehensive insights.
In an alternative approach exploring ancient DNA in Africa offers an avenue to investigate evolutionary questions, such as Neanderthal integration and its implications for modern disease susceptibility.
Identifying the right methodology
Balancing long- and short-read sequencing is essential. Long-read sequencing, though not yet well established on the continent, represents a transformative opportunity, requiring high-quality samples that cannot rely solely on existing biobanks. Its application should go beyond variant detection to include epigenetic studies, offering deeper insights into genomic regulation. For broad population studies, short-read sequencing remains a cost-effective approach, while long-read sequencing could be prioritised for underrepresented populations or those with high structural diversity. Establishing a robust, diverse, representative dataset—potentially encompassing 3 million African genomes—would significantly advance understanding of genetic diversity.
Integration of transcriptomics and proteomics with genome sequencing is equally crucial, addressing current gaps in these areas to create a more comprehensive biological framework. Reducing the costs of library preparation kits and reagents, which remain disproportionately high in Africa (and vary hugely across the continent), will be vital to scale these efforts sustainably and equitably. Also sequencing technology moves forward very fast which is challenging for existing African sequencing infrastructure. Lack of access to high power computing (HPC) equipment with graphic processing units (GPUs) and money to process the data is also a challenge. Equitable frameworks for public private partnerships could be very impactful in reducing these costs and addressing the challenges associated with fast moving technologies.
2. Linking sequencing to other outcomes
Sequencing and capacity building
Addressing the shortage of local trainers in specialised genomics is crucial for advancing the field in Africa.
Boosting funding to develop and support trainers can create a sustainable pipeline of expertise. While the
current focus in genomics leans heavily toward infectious diseases, capacity-building efforts must also prioritise human genomics to address broader health challenges. Retaining this capacity is equally vital; creating local opportunities in research and academia ensures that trained individuals can contribute meaningfully without seeking opportunities elsewhere.
A decentralised governance model could further strengthen these efforts. Establishing regional hubs across Africa to train researchers in bioinformatics, genomics, and data analysis would not only decentralise knowledge but also foster collaboration and innovation. These hubs would empower local scientists to lead research that addresses the continent’s unique health priorities while building a strong, self-sustaining genomics ecosystem.
Sequencing and tool development
Developing tools tailored to African data is essential for advancing genomics on the continent. This includes creating resources for genome-wide association studies, risk prediction, and genomic software specifically designed to address African datasets.
Additionally, artificial intelligence and machine learning models trained on non-African datasets perpetuate biases in predictive tools, such as polygenic risk scores and disease modelling. Developing models rooted in African datasets is critical to ensuring accuracy and relevance. Centralised African genomics databases, integrating environmental, clinical, and social determinants of health, would provide researchers with accessible, multi-layered datasets for impactful insights.
Global reference genomes currently exhibit biases, underscoring the need for population-specific references. Building upon and strengthening efforts like the African pangenome/African reference genome projects, tied to sequencing initiatives, could play a transformative role in building capacity and creating these tools.
Innovative approaches, such as hackathons (or prizes), can catalyse the creation of these tools by fostering collaboration and innovation. Equally important is securing funding to engage the public in the development process, building trust and ensuring that these tools are both ethical and aligned with community needs.
Sequencing and research questions
Africa's unique genetic diversity offers immense opportunities for transformative genomic research. Studies exploring genetic factors influencing disease susceptibility and resistance—such as malaria, sickle cell anaemia, tuberculosis, and HIV—could yield critical insights. Research in evolutionary genomics, including ancient DNA analysis and population genetics, can illuminate migration patterns and evolutionary adaptations.
Rare disease studies, leveraging Africa’s unparalleled genetic variation, have the potential to uncover rare variants and their functional effects, advancing global understanding of these conditions.
Integrating genomic data with hospital records and electronic health systems is crucial to connect research with tangible health outcomes. Genomic insights could guide impactful population-level interventions, such as refining vaccination strategies, implementing targeted screenings, or studying the interactions between genetics and environmental factors like climate change and pollution exposure. Cross-country collaborations would enhance these efforts by sharing expertise, samples, and data, with the active involvement of local governments to integrate findings into public health initiatives.
3. Enabling environment for sequencing
Tackling data sharing challenges
To ensure sustainability and equity, policies supporting data sovereignty must be prioritised, guaranteeing that data generated in Africa remains accessible to African researchers and delivers direct benefits to local communities.
Ethical and equitable data sharing requires addressing disparities in data sharing policies from the outset. Regional data protection agreements can provide transformative guidelines for sharing sensitive datasets, particularly those linked to health outcomes. Transparency about existing data-sharing legislations is essential to harmonise country- and region-specific policies into cohesive frameworks. A communal effort to develop standardised approaches at regional or continental levels would ensure consistency and trust among stakeholders. A unified framework enabling seamless access to data across national borders could significantly enhance collaboration and research outcomes.
Federated data analysis offers a promising solution, allowing insights to be derived without the physical transfer of sensitive data, thereby respecting sovereignty and privacy concerns.
Implementing ethical approaches in genomics research
Community engagement and feedback are essential at every stage of genomics research to ensure ethical, impactful, and equitable outcomes. Benefit-sharing should be a cornerstone of this work, with tangible advantages flowing back to the communities contributing data.
Addressing ethical concerns, such as participants’ understanding of consent, requires enhanced ethics training for researchers and better communication methods tailored to local languages. Building trust—both at the community level and among researchers—is critical, particularly for data sharing and the long-term success of genomics initiatives.
Ethics committees often pose challenges to genomics studies due to the lack of genomics expertise and understanding of genomics studies. Many committees might not be well equipped to evaluate ethics applications for genomics studies and their approach to data or sample handling. This highlights the need for targeted ethics education for genomics and clearer policies on sample collection, storage, and usage. Rethinking consent for long-term storage is vital, alongside mechanisms to transparently show participants how their data has contributed to advancements over time. Existing resources, such as H3Africa samples, should be optimised for longer term use, supported by long-term funding for sustainable sample storage.
Policy makers must be actively engaged to recognise the transformative potential of genomics. Learning from successful initiatives in other geographies like UK Biobank and Genomics England can help shape policies that maximise the value of samples and ensure broader population benefits.