Establishing a sorting protocol for healthcare databases
  • Elie Ghabi
    Faculty of Medicine, University of Balamand, Lebanon.
  • Wehbeh Farah
    UEGP, Faculty of Sciences, Saint Joseph University of Beirut, Lebanon.
  • Maher Abboud
    UEGP, Faculty of Sciences, Saint Joseph University of Beirut, Lebanon.
  • Elias Chalhoub
    Medical Laboratory Sciences Department, Faculty of Health Sciences, University of Balamand, Lebanon.
  • Nelly Ziade
    Faculty of Medicine, Saint Joseph University of Beirut, Lebanon.
  • Isabella Annesi-Maesano
    Institut Pierre Louis d’Epidémiologie et de Santé Publique, Equipe EPAR, Sorbonne Universités, Paris, France.
  • Laurie Abi-Habib
    Public Health Department, Faculty of Health Sciences, University of Balamand, Lebanon.
  • Myriam Mrad Nakhle
    Public Health Department, Faculty of Health Sciences, University of Balamand, Lebanon.


Background: Health information records in many countries, especially developing countries, are still paper based. Compared to electronic systems, paper-based systems are disadvantageous in terms of data storage and data extraction. Given the importance of health records for epidemiological studies, guidelines for effective data cleaning and sorting are essential. They are, however, largely absent from the literature. The following paper discusses the process by which an algorithm was developed for the cleaning and sorting of a database generated from emergency department records in Lebanon.

Design and methods
: Demographic and health related information were extracted from the emergency department records of three hospitals in Beirut. Appropriate categories were selected for data categorization. For health information, disease categories and codes were selected according to the International Classification of Disease 10th Edition.

: A total of 16,537 entries were collected. Demographic information was categorized into groups for future epidemiological studies. Analysis of the health information led to the creation of a sorting algorithm which was then used to categorize and code the health data. Several counts were then performed to represent and visualize the data numerically and graphically.

: The article describes the current state of health information records in Lebanon and the associated disadvantages of a paper-based system in terms of storage and data extraction. Furthermore, the article describes the algorithm by which health information was sorted and categorized to allow for future data analysis using paper records.


Anderson HR, de Leon AP, Bland JM, et al. Air pollution and daily mortality in London: 1987-92. BMJ 1996;312:665-9. DOI:

Schwartz J, Marcus A. Mortality and air pollution in London: a time series analysis. Am J Epidemiol 1990;131:185-94. DOI:

Schwartz J, Dockery DW. Increased mortality in Philadelphia associated with daily air pollution concentrations. Am Rev Respir Dis 1992;145:600-4. DOI:

Zanobetti A, Schwartz J. The effect of fine and coarse particulate air pollution on mortality: a national analysis. Environ Health Perspect 2009;117:898-903. DOI:

Filleul L, Rondeau V, Vandentorren S, et al. Twenty five year mortality and air pollution: results from the French PAARC survey. Occup Environ Med 2005;62:453-60. DOI:

Verhoeff AP, Hoek G, Schwartz J, van Wijnen JH. Air pollution and daily mortality in Amsterdam. Epidemiology 1996:225-30. DOI:

Hoek G, Brunekreef B, Goldbohm S, et al. Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study. Lancet 2002;360:1203-9. DOI:

Katsouyanni K, Touloumi G, Samoli E, et al. Confounding and effect modification in the short-term effects of ambient particles on total mortality: results from 29 European cities within the APHEA2 project. Epidemiology 2001;12:521-31. DOI:

Touloumi G, Samoli E, Katsouyanni K. Daily mortality and “winter type” air pollution in Athens, Greece–a time series analysis within the APHEA project. J Epidemiol Commun Health 1996;50:s47-51. DOI:

Sunyer J, Castellsagué J, Sáez M, et al. Air pollution and mortality in Barcelona. J Epidemiol Commun Health 1996;50:s76-80. DOI:

Spix C, Heinrich J, Dockery D, Schwartz J, et al. Air pollution and daily mortality in Erfurt, east Germany, 1980-1989. Environ Health Perspect 1993;101:518-26. DOI:

Kelly FJ, Fussell JC. Air pollution and public health: emerging hazards and improved understanding of risk. Environ Geochem Health 2015;37:631-49. DOI:

Health Effects Institute [Internet]. State of Global Air 2018. Accessed: 2018 May 21]. Available from:

Wordley J, Walters S, Ayres JG. Short term variations in hospital admissions and mortality and particulate air pollution. Occup Environ Med 1997;54:108-16. DOI:

Dab W, Medina S, Quenel P, et al. Short term respiratory health effects of ambient air pollution: results of the APHEA project in Paris. J Epidemiol Commun Health 1996;50:s42-6. DOI:

Kelly FJ, Fussell JC. Health effects of airborne particles in relation to composition, size and source. In: FJ Kelly, JC Fussell, editors. Airborne Particulate Matter: Sources, atmospheric processes and health. London: Royal Society of Chemistry; 2016. p. 344-82. DOI:

Grigg J. Particulate matter exposure in children: relevance to chronic obstructive pulmonary disease. Proc Am Thorac Soc 2009;6:564-9. DOI:

Grigg J. Air pollution and children's respiratory health–gaps in the global evidence. Clin Experiment Allergy 2011;41:1072-5. DOI:

Latzin P, Röösli M, Huss A, et al. Air pollution during pregnancy and lung function in newborns: a birth cohort study. Eur Respir J 2009;33:594-603. DOI:

Jedrychowski WA, Perera FP, Spengler JD, et al. Intrauterine exposure to fine particulate matter as a risk factor for increased susceptibility to acute broncho-pulmonary infections in early childhood. Int J Hygiene Environ Health 2013;216:395-401. DOI:

Mortimer K, Neugebauer R, Lurmann F, et al. Air pollution and pulmonary function in asthmatic children: effects of prenatal and lifetime exposures. Epidemiology 2008:550-7. DOI:

Morales E, Garcia-Esteban R, de la Cruz OA, et al. Intrauterine and early postnatal exposure to outdoor air pollution and lung function at preschool age. Thorax 2015;70:64-73. DOI:

Nakhlé MM, Farah W, Ziade N, et al. Short-term relationships between emergency hospital admissions for respiratory and cardiovascular diseases and fine particulate air pollution in Beirut, Lebanon. Environ Monitor Assess 2015;187:196. DOI:

Kobrossi R, Nuwayhid I, Sibai AM, et al. Respiratory health effects of industrial air pollution on children in North Lebanon. Int J Environ Health Res 2002;12:205-20. DOI:

Salameh P, Salame J, Khayat G, et al. Exposure to outdoor air pollution and chronic bronchitis in adults: a case-control study. Int J Occup Environ Med 2012;3:165-77.

Khoury MJ, Ioannidis JP. Big data meets public health. Science 2014;346:1054-5. DOI:

Zheng Y, Liu F, Hsieh HP. U-air: When urban air quality inference meets big data. In: Proceedings 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago; 2013. p. 1436-44. DOI:

Zheng Y, Chen X, Jin Q, et al. A cloud-based knowledge discovery system for monitoring fine-grained air quality. MSR-TR-2014-40, |Microsoft Research. 2014.

Ram S, Zhang W, Williams M, Pengetnze Y. Predicting asthma-related emergency department visits using big data. IEEE J Biomed Health Inform 2015;19:1216-23. DOI:

Huang T, Lan L, Fang X, et al. Promises and challenges of big data computing in health sciences. Big Data Res 2015;2:2-11. DOI:

Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2005;2:e267. DOI:

Winkler WE. Data cleaning methods. Proceedings ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington DC, 2003. Available from:

Loureiro A, Torgo L, Soares C. Outlier detection using clustering methods: a data cleaning application. Proceedings of KDNet Symposium on Knowledge-based systems for the Public Sector, 2004.

Hall GC, Sauer B, Bourke A, et al. Guidelines for good database selection and use in pharmacoepidemiology research. Pharmacoepidemiol Drug Saf 2012;21:1-0. DOI:

Borer ET, Seabloom EW, Jones MB, Schildhauer M. Some simple guidelines for effective data management. Bull Ecol Soc Am 2009;90:205-14. DOI:

Nakhlé MM, Farah W, Ziade N, et al. Beirut air pollution and health effects-BAPHE study protocol and objectives. Multidiscip Respir Med 2015;10:21. DOI:

United Nations, Department of International Economic and Social Affairs. Provisional Guidelines on Standard International Age Classifications, Statistical Paper Series M, No. 74. 1982- Available from:

WHO. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2004.

Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008;47:128-44.

Crammer K, Dredze M, Ganchev K, et al. Automatic code assignment to medical text. Proceedings of the Workshop on BioNLP 2007: Biological, translational, and clinical language processing. Stroudsburg: Association for Computational Linguistics. p. 129-36. DOI: