Background: Health information records in many countries, especially developing countries, are still paper based. Compared to electronic systems, paper-based systems are disadvantageous in terms of data storage and data extraction. Given the importance of health records for epidemiological studies, guidelines for effective data cleaning and sorting are essential. They are, however, largely absent from the literature. The following paper discusses the process by which an algorithm was developed for the cleaning and sorting of a database generated from emergency department records in Lebanon.
Design and methods: Demographic and health related information were extracted from the emergency department records of three hospitals in Beirut. Appropriate categories were selected for data categorization. For health information, disease categories and codes were selected according to the International Classification of Disease 10th Edition.
Results: A total of 16,537 entries were collected. Demographic information was categorized into groups for future epidemiological studies. Analysis of the health information led to the creation of a sorting algorithm which was then used to categorize and code the health data. Several counts were then performed to represent and visualize the data numerically and graphically.
Conclusions: The article describes the current state of health information records in Lebanon and the associated disadvantages of a paper-based system in terms of storage and data extraction. Furthermore, the article describes the algorithm by which health information was sorted and categorized to allow for future data analysis using paper records.
Anderson HR, de Leon AP, Bland JM, et al. Air pollution and daily mortality in London: 1987-92. BMJ 1996;312:665-9. DOI: https://doi.org/10.1136/bmj.312.7032.665
Schwartz J, Marcus A. Mortality and air pollution in London: a time series analysis. Am J Epidemiol 1990;131:185-94. DOI: https://doi.org/10.1093/oxfordjournals.aje.a115473
Schwartz J, Dockery DW. Increased mortality in Philadelphia associated with daily air pollution concentrations. Am Rev Respir Dis 1992;145:600-4. DOI: https://doi.org/10.1164/ajrccm/145.3.600
Zanobetti A, Schwartz J. The effect of fine and coarse particulate air pollution on mortality: a national analysis. Environ Health Perspect 2009;117:898-903. DOI: https://doi.org/10.1289/ehp.0800108
Filleul L, Rondeau V, Vandentorren S, et al. Twenty five year mortality and air pollution: results from the French PAARC survey. Occup Environ Med 2005;62:453-60. DOI: https://doi.org/10.1136/oem.2004.014746
Verhoeff AP, Hoek G, Schwartz J, van Wijnen JH. Air pollution and daily mortality in Amsterdam. Epidemiology 1996:225-30. DOI: https://doi.org/10.1097/00001648-199605000-00002
Hoek G, Brunekreef B, Goldbohm S, et al. Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study. Lancet 2002;360:1203-9. DOI: https://doi.org/10.1016/S0140-6736(02)11280-3
Katsouyanni K, Touloumi G, Samoli E, et al. Confounding and effect modification in the short-term effects of ambient particles on total mortality: results from 29 European cities within the APHEA2 project. Epidemiology 2001;12:521-31. DOI: https://doi.org/10.1097/00001648-200109000-00011
Touloumi G, Samoli E, Katsouyanni K. Daily mortality and “winter type” air pollution in Athens, Greece–a time series analysis within the APHEA project. J Epidemiol Commun Health 1996;50:s47-51. DOI: https://doi.org/10.1136/jech.50.Suppl_1.s47
Sunyer J, Castellsagué J, Sáez M, et al. Air pollution and mortality in Barcelona. J Epidemiol Commun Health 1996;50:s76-80. DOI: https://doi.org/10.1136/jech.50.Suppl_1.s76
Spix C, Heinrich J, Dockery D, Schwartz J, et al. Air pollution and daily mortality in Erfurt, east Germany, 1980-1989. Environ Health Perspect 1993;101:518-26. DOI: https://doi.org/10.1289/ehp.93101518
Kelly FJ, Fussell JC. Air pollution and public health: emerging hazards and improved understanding of risk. Environ Geochem Health 2015;37:631-49. DOI: https://doi.org/10.1007/s10653-015-9720-1
Health Effects Institute [Internet]. State of Global Air 2018. Accessed: 2018 May 21]. Available from: https://www.stateofglobalair.org/sites/default/files/soga-2018-report.pdf
Wordley J, Walters S, Ayres JG. Short term variations in hospital admissions and mortality and particulate air pollution. Occup Environ Med 1997;54:108-16. DOI: https://doi.org/10.1136/oem.54.2.108
Dab W, Medina S, Quenel P, et al. Short term respiratory health effects of ambient air pollution: results of the APHEA project in Paris. J Epidemiol Commun Health 1996;50:s42-6. DOI: https://doi.org/10.1136/jech.50.Suppl_1.s42
Kelly FJ, Fussell JC. Health effects of airborne particles in relation to composition, size and source. In: FJ Kelly, JC Fussell, editors. Airborne Particulate Matter: Sources, atmospheric processes and health. London: Royal Society of Chemistry; 2016. p. 344-82. DOI: https://doi.org/10.1039/9781782626589-00344
Grigg J. Particulate matter exposure in children: relevance to chronic obstructive pulmonary disease. Proc Am Thorac Soc 2009;6:564-9. DOI: https://doi.org/10.1513/pats.200905-026RM
Grigg J. Air pollution and children's respiratory health–gaps in the global evidence. Clin Experiment Allergy 2011;41:1072-5. DOI: https://doi.org/10.1111/j.1365-2222.2011.03790.x
Latzin P, Röösli M, Huss A, et al. Air pollution during pregnancy and lung function in newborns: a birth cohort study. Eur Respir J 2009;33:594-603. DOI: https://doi.org/10.1183/09031936.00084008
Jedrychowski WA, Perera FP, Spengler JD, et al. Intrauterine exposure to fine particulate matter as a risk factor for increased susceptibility to acute broncho-pulmonary infections in early childhood. Int J Hygiene Environ Health 2013;216:395-401. DOI: https://doi.org/10.1016/j.ijheh.2012.12.014
Mortimer K, Neugebauer R, Lurmann F, et al. Air pollution and pulmonary function in asthmatic children: effects of prenatal and lifetime exposures. Epidemiology 2008:550-7. DOI: https://doi.org/10.1097/EDE.0b013e31816a9dcb
Morales E, Garcia-Esteban R, de la Cruz OA, et al. Intrauterine and early postnatal exposure to outdoor air pollution and lung function at preschool age. Thorax 2015;70:64-73. DOI: https://doi.org/10.1136/thoraxjnl-2014-205413
Nakhlé MM, Farah W, Ziade N, et al. Short-term relationships between emergency hospital admissions for respiratory and cardiovascular diseases and fine particulate air pollution in Beirut, Lebanon. Environ Monitor Assess 2015;187:196. DOI: https://doi.org/10.1007/s10661-015-4409-6
Kobrossi R, Nuwayhid I, Sibai AM, et al. Respiratory health effects of industrial air pollution on children in North Lebanon. Int J Environ Health Res 2002;12:205-20. DOI: https://doi.org/10.1080/09603/202/000000970
Salameh P, Salame J, Khayat G, et al. Exposure to outdoor air pollution and chronic bronchitis in adults: a case-control study. Int J Occup Environ Med 2012;3:165-77.
Khoury MJ, Ioannidis JP. Big data meets public health. Science 2014;346:1054-5. DOI: https://doi.org/10.1126/science.aaa2709
Zheng Y, Liu F, Hsieh HP. U-air: When urban air quality inference meets big data. In: Proceedings 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago; 2013. p. 1436-44. DOI: https://doi.org/10.1145/2487575.2488188
Zheng Y, Chen X, Jin Q, et al. A cloud-based knowledge discovery system for monitoring fine-grained air quality. MSR-TR-2014-40, |Microsoft Research. 2014.
Ram S, Zhang W, Williams M, Pengetnze Y. Predicting asthma-related emergency department visits using big data. IEEE J Biomed Health Inform 2015;19:1216-23. DOI: https://doi.org/10.1109/JBHI.2015.2404829
Huang T, Lan L, Fang X, et al. Promises and challenges of big data computing in health sciences. Big Data Res 2015;2:2-11. DOI: https://doi.org/10.1016/j.bdr.2015.02.002
Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2005;2:e267. DOI: https://doi.org/10.1371/journal.pmed.0020267
Winkler WE. Data cleaning methods. Proceedings ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington DC, 2003. Available from: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2066&rep=rep1&type=pdf
Loureiro A, Torgo L, Soares C. Outlier detection using clustering methods: a data cleaning application. Proceedings of KDNet Symposium on Knowledge-based systems for the Public Sector, 2004.
Hall GC, Sauer B, Bourke A, et al. Guidelines for good database selection and use in pharmacoepidemiology research. Pharmacoepidemiol Drug Saf 2012;21:1-0. DOI: https://doi.org/10.1002/pds.2229
Borer ET, Seabloom EW, Jones MB, Schildhauer M. Some simple guidelines for effective data management. Bull Ecol Soc Am 2009;90:205-14. DOI: https://doi.org/10.1890/0012-9623-90.2.205
Nakhlé MM, Farah W, Ziade N, et al. Beirut air pollution and health effects-BAPHE study protocol and objectives. Multidiscip Respir Med 2015;10:21. DOI: https://doi.org/10.1186/s40248-015-0016-1
United Nations, Department of International Economic and Social Affairs. Provisional Guidelines on Standard International Age Classifications, Statistical Paper Series M, No. 74. 1982- Available from: https://unstats.un.org/unsd/publication/SeriesM/SeriesM_74e.pdf
WHO. International statistical classification of diseases and related health problems. Geneva: World Health Organization; 2004.
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008;47:128-44.
Crammer K, Dredze M, Ganchev K, et al. Automatic code assignment to medical text. Proceedings of the Workshop on BioNLP 2007: Biological, translational, and clinical language processing. Stroudsburg: Association for Computational Linguistics. p. 129-36. DOI: https://doi.org/10.3115/1572392.1572416