Valley fever (VF) is a fungal infection caused by Coccidioides species that grow in desert regions of the US and Central and South America. Of cases reported to the US Centers for Disease Control and Prevention, most were found to be in Arizona (64%) and California (33%) (based on 1998-2014 data). In Arizona, 90 cases per 100,000 were reported in 2013 with an associated hospitalization cost of $53 million.1 In the ten years between 2003 and 2013, VF incidence in Arizona nearly doubled,1 but fine scale (local) risk factors involved in the acquisition of this illness remain unknown.2,3 The fungus grows in the arid soil of the US southwest. Its growth likely follows a seasonal pattern where it grows during the fall monsoon season and then forms arthroconidia and may become airborne through wind or mechanical disturbance.4 Infection occurs when the aerosolized spore lodges in the lung of humans and other mammals. While much is understood about the VF pathology, delineating the spatial extent of VF risk is hampered by challenges with detection of the pathogen in soil, wide spectrum of disease, imperfect diagnostic tests, and changes in case reporting.5
Much of the work utilizes the case residence for mapping cases, which poses an additional problem of representing cases based on where they live rather than where they may have been exposed. This is in part because resources are limited to investigate all cases for travel history during the 7-28 day incubation period restricting the capacity to trace probably environmental exposures. These residential point data are then matched to exposure/risk variables such as zip-code level demographic data or, in the US, Landsat level environmental data. In their analysis of VF cases in Maricopa County, Arizona, Park et al.6 standardized the number of cases by zip-code level age-adjusted population density in order to map the distribution of VF cases.
In addition, there are no methods to perform large scale soil sampling for this fungus.2 Despite the challenges to mapping the fungus itself, soils associated with semiarid regions have been shown to support Coccidioides growth and soil has been used to map risk.7,8 Baptista-Rosas et al.9 used a variety of climate and topographical data (~1 km2 spatial resolution) trained on 18 soil sampling sites with 16 from California, and 1 each from Arizona and Sonora, Mexico to map the ecological niche of Coccidioides spp. in southern California, Arizona, and in Sonora Mexico. When overlaid with a scanned map with an unknown resolution of VF infections in Mexico from 1966,10 ecological niche models identified areas of increased risk. Others used Landsat TM (~25m2 resolution) to map proxies,11 like construction and archeologic digs, a known risk factor for VF because the disturbance of soil or air may aerosolize the fungal spores.12,13 Finally, rodent burrows have been associated with Coccidioides growth and exposure where the micro-climate created in burrows may help promote fungal growth.2,3,14,15
Until a more efficient method for detecting the fungus in soil is developed, public health prevention messaging tailored to specific environmental risk areas will continue to be based primarily on human case data. In this study, we evaluate social and environmental risk factors associated with VF cases for the state of Arizona. While this study contributes to the VF literature by examining the spatial distribution of VF cases and the association with a number of potential risk factors in a high risk state, we also highlight how the uncertainty associated with spatial scale and aggregation affects risk mapping. The VF dataset epitomizes the issues associated with spatial scale in public health studies as it has been a common practice that health and demographic data are aggregated to protect individual identity and the commonly utilized supplemental environmental data are acquired from multiple sources and at multiple scales. The aim of this study was to not only evaluate risk factors, but to use the analysis as an example to highlight the importance of addressing spatial scale in public health.
Materials and Methods
Study area and Valley fever data
Arizona accounts for approximately two thirds of all reported cases of VF in the U.S. Individual VF cases in Arizona from 2006 to 2009 provided by the Arizona Department of Health Services (ADHS) were considered in this study. Approval from the University of Arizona Institutional Review Board and the ADHS Human Subjects Review Board was obtained to conduct analysis of these surveillance data. During this time period the case definition used by Arizona, i.e., laboratory confirmation only, and diagnostic tools were consistent and, since 2006, the Health Department has used MEDSIS for surveillance. In the middle of 2009, a major commercial laboratory began reporting positive EIAs, which inflated the incidence of disease compared with previous reports). Furthermore, subsequent changes in reporting and testing preclude the use of more recent data. The data contain information on gender, age group (5 year intervals), date of diagnosis, and ethnicity. Because the ethnicity data were incomplete, this information was not used for this analysis. Of the 19,816 reported cases, 13,940 (70.3%) had sufficient address information to be successfully geocoded. Due to missing demographic data, only 12,349 (62.3% of the total reported) individuals were included in our analysis (Figure 1). When aggregated, the analysis included 4178 Census block groups, 1526 Census tracts, and 348 zip codes.
Pseudo-absence data were generated for controls using the block group because it is the smallest Census unit containing socioeconomic information. The number of pseudo-absences per block group were generated as a proportion to the population in the block group where, for block group i, the number of pseudo-absences was Nti/T, where N is the total number of pseudo-absences to be generated, ti is the population in block group i, and T is the total population in the study area. We generated 50,000 pseudo-absences with a random distribution for better model reliability.16 Locations in block groups with no population or missing data were eliminated yielding 45,460 pseudo-individuals with attributes assigned.
Predictor variables: Data were extracted at the point level and at aggregations of block group, tract and zip code. The three areal units represent the geographic scale commonly adopted by researchers and public health practitioners for risking mapping given that other socioeconomic data are often available at these scales.
Because of high incidence rates among elderly individuals,17 we expected a positive association between age and cases. Demographic data included the 2010 decennial data from the U.S. Census Bureau (population, age distribution and income). In the VF case data set, because age was reported in 5-year groups, the median value of the age group was assigned to each individual. For areal units, population over 65 was calculated based on the Census age group data by summing the population over all age groups over 65. Socioeconomic factors have been found to be associated for many other health outcomes and specifically with the development of severe pulmonary coccidioidomycosis.18,19 As no income information was reported for cases, income was interpolated based on the distribution of income groups at the block group level for each individual’s residence. Median income data at the block group or tract level were available, but had to be interpolated using block group data for the zip code level.
Using raster data from the Gridded Soil Survey Geographic conducted by USDA,20 we extracted the attributes of soil organic carbon (SOC) and available water storage (AWS). Generally, the fungus is associated with dry desert soils.21 However, recovery of the fungus from those soils tend to be associated with rodent burrows or other more humid loci with higher organic content areas.2,22,23 For the areal analysis, the SOC and AWS value of an area is defined as the mean value of all raster cells within the geographic unit. For the individual level analysis, the soil property was derived based on an individual’s residential site. For the aggregated analyses, soil property data in a study unit were averaged using observations (10 × 10 m resolution) falling within the unit.
Coccidioides species are not typically associated with cultivated land cover.21 To assess land cover, five land cover types (medium density residential area, high density residential area, shrub/scrub, pasture/hay and cultivated crops) at 30 by 30 m resolution were extracted from the National Land Cover Database 2006.24 These data were categorical variables and for the individual level analysis, class was assigned based on the location of residence. For the aggregated analyses, each land cover variable was calculated as the percentage of the particular land cover relative to the entire unit.
Distance to desert (shrub-brush rangeland in USGS classification)25 was calculated using shapefiles from the USGS.26 Distance was measured from an individual’s residence to the nearest desert identified in the shapefile. For the aggregated analyses, distance was measured from a unit’s centroid to the nearest desert identified in the shape file.
In addition to land cover above, we calculated distance to water which has been associated with burrows in an arid region.27 In Arizona, while waterways may be dry for most of the year, they serve as wildlife thoroughfares and represent more natural habitat within a city. Distance to wetland (km) was measured from an individual’s residence or from the aggregation unit’s centroid to the nearest wetland identified in the US Fish and Wildlife shapefile.28
We conducted the analysis at individual and area (block group, tract and zip code) levels to assess how scale influences predictors of VF incidence. At the individual level, a logistic regression model was built to examine the individual-based risk factors, while the Poisson distribution was found to be appropriate for modeling the observed cases at each of the three aggregation scales. Predictors included age (or total population over 65 for aggregated analysis within the uniform populations of Census units), soil property (i.e., SOC and AWS), land cover types (or percent in unit for aggregated analysis), distance to wetland, distance to desert, and income level. All predictor variables were included in the final models to map VF incidence. The incidence calculated by the predict function in R was mapped using ArcGIS. Choropleth maps with five categories were generated to show the predicted value and the 95% confidence interval of the regressions.
While eight variables were found to have a consistent association with VF cases across the scales of analysis and are factors found to be associated with VF incidence in the literature, three were inconsistent across scale (Table 1). Distance to wetland was significant at the block group and tract levels, but not at the individual and zip code levels. Although regression results at the tract level and zip code level indicated a significant positive association with shrub/scrub land cover, results at the block group level were not significant and the association was negative at the individual level analysis. Similarly, while area-based analysis results suggested a positive association between cultivated crops and VF cases, results at the individual level revealed an opposite association.
While certain variables were found to vary more at a coarser scale with a higher level aggregation (e.g., distance to wetland), some variables (e.g., median income) demonstrated a smoothing effect at larger scales with smaller variation (Table 2). Some other variables (e.g., distance to desert and soil organic carbon) showed inconsistent variation across the spatial scales. Although the five land cover types computed using three aggregation schemes provide a similar amount of overall coverage of the entire study area (about 80% at the block group level and the tract level and 77% at the zip code level), the greater aggregation to the zip code exhibited much variation among these land covers. For example, among the five land covers, medium/high density residential areas on average account for 69% at the block group level and 61% at the tract level but a much smaller percentage, 28%, at the zip code level, with shrub/scrub as it’s effective reciprocal (25%, 31%, and 65% respectively) .
In addition to the land cover variables where spatial uncertainty with great variation is inherently embedded in the spatial aggregation process due to change of analysis scale, uncertainty was also introduced when the distance variables were created, with the average decreasing at coarser analysis scales. For the two distance variables examined in this study, the maximum distance to wetland decreased from 24 km at the block group level to 2 km at the zip code level, and the maximum distance to desert reduced from 39 km at the block group level to 17 km at the zip code level.
Implications of scale
Using the coefficient weightings identified for the model at each scale (Table 1), we mapped expected VF incidences across the three area-based aggregation scales, a) block group, b) tract, and c) zip code (Figure 2, expected VF cases per 1000 people). The foci - darker colors indicating greater number of cases per population - move in space depending upon the level of aggregation. The error associated with these scales is also presented (Figure 3) by mapping the width of 95% confidence intervals around the incidence estimates. These error maps show that the error differs across scales and the analysis at the largest scale, the zip code level (Figure 3C), tends to give high error at more locations and in more northerly areas where endemicity is less established.
Across analytic scales, consistent associations with the number of reported VF cases and risk factors including proportion elderly, income status, the two soil properties, the land covers of residential area (high/medium density) and pasture/hay, and distance to desert were found (Table 2). The positive association with elderly is consistent with previous research17 as elderly in Arizona may be more susceptible because of length of residency/immigration (Phoenix and Tucson hosting a large number of retirement communities) or compromised immune systems. The positive association with income level might not necessarily mean that people with higher income are more likely to contract VF rather a bias may exist in the reported cases considering the associated access to and ability to pay for healthcare. That is, when symptomatic, people with lower income level have been found to be more likely to delay health care (and for the VF disease, the symptom generally resolve within weeks) due to the cost concern.29
The associations with certain land cover types identified (e.g., residential), soil properties, and proximity to desert were consistent with the previous research with respect to desert soil types supporting the fungal growth,22,30 as well as places were soil disturbance has been associated with exposures.6,12,13 The negative correlation between soil organic carbon and the VF cases may be indicative that this predictor is capturing the soil type that supports Coccidioides species rather than the association with rodent burrows where the fungus has been more often recovered.2,21-23 Some studies found a significant association with the timing of precipitation and growth of Coccidioides,4,31 the amount of water and organic carbon in soil supports fungal competitors which may impede Coccidioides species growth.15 The positive association with medium and high density residential land cover may relate to case attribution: patient’s home addresses, which are mostly located in residential areas. The consistent predictors identified across the state as well as multiple spatial scales lend additional support to these as environmental risk factors for VF.
The inconsistent association of VF cases with possible risk factors of two land cover types (shrub/scrub and cultivated crops) and distance to wetland suggests the existence of biased and/or inefficient estimates when analysis scale changes. In light of the lack of methods for field recovery of specimens and reporting changes associated with surveillance for VF, the issues with scale of analysis further complicate public health risk messaging and attempts at quantifying the association between VF incidence and weather.31 Moreover, this finding highlights the importance of considering scale when interpreting spatial risk or when comparing between studies conducted at multiple scales.
While it is challenging to identify the mechanism behind how spatial scale affects analysis results, the somewhat contradicting results at the area level with several predictors highlight problems associated with scale and aggregation.32,33 The kind of data used here, especially distance/proximity based evaluation of potential environmental exposure are one of the most common spatial methods in public health studies.34 Our study suggests that the spatial uncertainty introduced in the modeling process may relate to the scale of the spatial units used to compute distances.
In addition to uncertainties due to spatial scale, other uncertainty may be associated with exposure site in deriving the risk factors or the accuracy of the available environmental data to capture the biological processes being explained. Similar to many other public health studies, variables included in the model are simply proxies of the potential exposure. For example, the distance to desert variable (measured from an individual’s home) was used to approximate the possibility and frequency of an individual visiting desert where the exposure may occur or that they might be exposed to blown spores at the home. An improvement may come from data on time spent outdoors, which has been shown to be an important risk factor in childhood lead exposure.35 Alternatively, error may also arrive from utilizing national databases to describe local phenomena. For example, the shrub/scrub classification for Arizona encompasses the Sonoran Desert of the south as well as the Colorado Plateau in the northeast. Failing to construct an appropriate exposure measurement can lead to systematic bias.36,37 A more accurate way to model the risk factors or identify potential exposure sites would be to derive an individual’s activity space with the associated time duration and then evaluate its overlay with the potential risk factors of soil, desert, water or various land covers. However, deriving such activity space can be challenging given our limited knowledge on how each individual moves in spacetime. An integration of travel diary data,38 geotagged social media data,39 or GPS data appears promising in more accurately delineating one’s exposure territory.
In this study, we explored the association of VF cases with a range of socioeconomic and environmental variables and analyzed the potential spatial scale related uncertainty. Our findings support other work indicating social factors of age and income as well as environmental factors associated with increased risk of VF incidence. However, we also found inconsistent associations across spatial scales for certain variables. Potential issues associated with spatial data alignment and variable construction at alternative scales were discussed. The analysis highlights both the significance of uncertainty with respect to spatial data manipulation and how that is compounded when we have limited exposure data to correct or train the models.
While there is no consensus as to which scale should be adopted in a public health study or how to select an optimal scale, comparing across scales for consistency and being aware of the implications of choosing a given scale will affect the effectiveness of public health interventions on populations at risk. In the absence of a global solution, evaluating associations at multiple scales, especially for spatially explicit variables or factors that vary significantly in space, may help to identify factors with greater uncertainty if the direction of the association changes across spatial scales.