Tracing Tokyo’s air sources to identify Kawasaki Disease’s etiological triggers

Kawasaki Disease
Epidemiology
Atmospheric Transport
HYSPLIT
Python
Reproducibility
Conference report on Kawasaki disease incidence in Tokyo and differential HYSPLIT back-trajectory source regions.
Author

Alejandro Fontal

Published

April 3, 2022

This document intends to expand, provide context and reproducible analysis code on the study presented in a poster for the 20th International Vasculitis and ANCA Workshop, which takes place in Dublin, Ireland, from the 3rd to the 6th of April 2022.

The original code and analysis notebook is available in the source repository.

Table of contents

Background

Kawasaki Disease (KD) is a systemic vasculitis that mainly affects children younger than 5 years old. Although KD cases have been registered in over 60 countries across several continents, its incidence is highest in East Easia, particularly in Japan, where the highest annual incidence rate was recorded in 2018: 359 per 100 000 children aged 0-4 years (Ae et al. 2020). After more than five decades since its discovery and active research, the etiology of KD is yet to be elucidated. Recent studies have analyzed the association between KD and diverse environmental factors, with some advances pointing towards a relevant role of the atmospheric transport of a wind-borne agent triggering the disease (Rodó et al. 2011, 2014). The specific nature of this agent(s) is still unknown, with biological (bacterial, fungal or viral) and chemical (pollution) elements being the strongest candidates to initiate the autoimmune cascade which leads to the disease.

One of the most striking features of the epidemiological dynamics of Kawasaki Disease is the defined seasonal structure across multiple countries, with broad coherence in fluctuations of cases across the Northern Hemisphere extra-tropical latitudes (Burns et al. 2013).

In the case of Japan, the number of admissions has been recorded in the epidemiological records since 1970, and show the following pattern:

KD Records Japan

KD Records Japan

Three main features are notable:

  • The epidemic peaks of 1979, 1982 and 1986 (and to a lesser degree, 1984).
  • The increasing trend starting around 1995 until today.
  • The marked seasonality from 2000 onwards.

All of this prompts several questions towards the nature of the drivers of these changes in incidence for a disease that is non-communicable in nature.

In this work, we focus on the period from 2011 to 2018 for the prefecture of Tokyo, and try to associate the seasonal maxima and minima to atmospheric transport patterns.

Methods

Kawasaki Disease

Source

We collected data on date of admission and reported date of symptoms onset for a total of 13970 KD cases in hospitals belonging to the Tokyo prefecture from 2011 to 2018. This data was originally collected by the (22nd to 25th) nationwide epidemiological surveys of Kawasaki disease in Japan (Makino et al. 2015, 2018, 2019; Ae et al. 2020).

kd_japan = (pd.read_csv('../data/kawasaki_disease/japan_ts.csv', index_col=0)
            .reset_index()
            .rename(columns={'index': 'date'})
            .assign(date=lambda dd: pd.to_datetime(dd.date))
)

Computation of KD maxima and minima

The data was aggregated to generate daily counts of hospital admissions and registered onsets.

kd_tokyo = pd.read_csv('../data/kawasaki_disease/tokyo_ts.csv')

(kd_tokyo
 .assign(rolling_7=lambda dd: dd.cases.rolling(7, center=True).mean())
 .assign(rolling_56=lambda dd: dd.cases.rolling(7 * 8, center=True).mean())
 .rename(columns={'cases': 'Daily cases',
                  'rolling_7': '7 Days MA',
                  'rolling_56': '8 Weeks MA'})
 .melt(['date', 'year'])
 .dropna()
 .pipe(lambda dd: p9.ggplot(dd)
       + p9.aes('date', 'value', color='variable', group='variable')
       + p9.geom_line(p9.aes(alpha='variable'))
       + p9.scale_alpha_manual([1, 1, .2])
       + p9.scale_color_manual(['black', 'red', 'black'])
       + p9.scale_x_datetime(labels=date_format('%Y-%m'),
                             breaks=date_breaks('20 months'))
       + p9.theme(figure_size=(4, 2), title=p9.element_text(size=11))
       + p9.labs(x='', y='Registered KD cases',
                 title='Daily KD cases in Tokyo', color='', alpha=''))
)

Daily counts

Daily counts

The daily counts were then aggregated by week to obtain a weekly estimate of case onsets per day (that is, the daily average for each consecutive week in a year).

kd_weekly_tokyo = (kd_tokyo
    .assign(date=lambda dd: pd.to_datetime(dd.date))
    .assign(week=lambda dd: (((dd.date.dt.dayofyear - 1) // 7) + 1).clip(None, 52))
    .groupby(['year', 'week'], as_index=False)
    .cases.mean()
)

To generate the yearly maxima and minima, we selected the 5 weeks with the most daily cases and the 5 weeks with the least daily cases for each year, respectively, and are shown in the following figure:

kd_weekly_tokyo_minmax = (kd_weekly_tokyo
 .groupby(['year'])
 .apply(lambda dd: pd.concat([
     dd.sort_values('cases').iloc[:5].assign(label='KD Minima'),
     dd.sort_values('cases').iloc[-5:].assign(label='KD Maxima')
 ]))
 .reset_index(drop=True)
)

min_max_weekly_dates = (kd_weekly_tokyo_minmax
 .merge((pd.DataFrame(dict(date=pd.date_range('2011', '2018-12-31')))
         .assign(year=lambda dd: dd.date.dt.year)
         .assign(week=lambda dd: (((dd.date.dt.dayofyear - 1) // 7) + 1)
         .clip(None, 52))))
 .drop(columns='cases')
)

Weekly counts

Weekly counts

Trajectory Generation

To model air mass back-trajectories, we used the Hybrid Single Particle Lagrangian Integrated Trajectory (HYSPLIT) model version 5.2 (Stein et al. 2015), which we operated programmatically via the Python package PySPLIT (Warner 2018) to generate a high amount of trajectories.

For each day of the period from the 1st of January 2011 to the 31st of December 2018, we generated 4 backtrajectories up to the previous 96h starting at 00:00, 6:00, 12:00 and 18:00, with the initial point 10 meters over sea surface in central Tokyo (139.65E, 35.68N).

HYSPLIT_DIR = '/home/afontal/utils/hysplit/exec/hyts_std'
HYSPLIT_WORKING = '/home/afontal/utils/hysplit/working'
METEO_DIR = '/home/afontal/utils/hysplit/meteo/gdas1'
OUT_DIR = '/home/afontal/projects/vasculitis2022-conference/output/trajectories'

for year, month in product(range(2011, 2019), range(1, 13)):
    pysplit.generate_bulktraj('tokyo',
                              hysplit_working=HYSPLIT_WORKING,
                              hysplit=HYSPLIT_DIR,
                              output_dir=OUT_DIR,
                              meteo_dir=METEO_DIR,
                              years=[year],
                              months=[month],
                              meteoyr_2digits=True,
                              hours=[0, 6, 12, 18],
                              altitudes=[10],
                              coordinates=(35.68, 139.65),
                              run=-96,
                              meteo_bookends=([4, 5], []))

Below, an extract showing the individual trajectories generated at different points in time:

Weekly counts

Weekly counts

Meteorology data

To model the air trajectories, HYSPLIT requires a set of gridded meteorological data at all pressure levels. In this case, we downloaded the GDAS 1x1°, 3 hour resolution dataset for every week from December 2010 (to be able to generate backtrajectories starting on January 2011) to December 2018. These data can directly be accessed through NOAA’S ARL FTP Server.

Differential Trajectory Analysis

As a first step, we generated a grid of frequency of intersection between lat-lon grid cells and the trajectories associated to KD maxima and KD minima dates, which are a total of 1120 trajectories associated to each group.

all_trajs = pysplit.make_trajectorygroup(glob('../output/trajectories/*'))

all_trajectories_data = (
    pd.concat([t.data.assign(traj_id=i) for i, t in enumerate(all_trajs)])
    .drop(columns=['Temperature_C', 'Temperature', 'Mixing_Depth'])
    .assign(start_time=lambda dd: dd.DateTime - pd.to_timedelta(dd.Timestep, unit='h'))
)

trajectory_lines = (
    all_trajectories_data
    .groupby(['traj_id', 'start_time'], as_index=False)['geometry']
    .apply(lambda x: LineString([(i.x, i.y) for i in x.to_list()]))
)

We first generate a grid of 200 by 200 cells based on the boundaries of the generated trajectories, which ends up looking like a 0.8x0.8° resolution grid like the one displayed in the following image:

margin = 1
xmin, ymin, xmax, ymax = all_trajectories_data.total_bounds.round() + [200, -margin, 0, margin]
n_cells = 200
cell_size = (xmax - xmin) / n_cells
grid_cells = []

for x0 in np.arange(xmin, xmax + cell_size, cell_size):
    for y0 in np.arange(ymin, ymax + cell_size, cell_size):
        x1 = x0 - cell_size
        y1 = y0 + cell_size
        grid_cells.append(shapely.geometry.box(x0, y0, x1, y1))

grid = (gpd.GeoDataFrame(grid_cells, columns=['geometry'])
        .assign(cell=lambda dd: dd.geometry)
        .assign(x=lambda dd: dd.cell.centroid.x, y=lambda dd: dd.cell.centroid.y))

x_y_poly = grid[['x', 'y', 'cell']].drop_duplicates()

Map Grid

Map Grid

Then, we count the number of intersections of the trajectories of each KD group for each of the grid cells, which allows us to visualize longer term or aggregated patterns, like in this figure that displays the different sources associated to KD maxima and minima per calendar year:

min_max_trajectories = (
    trajectory_lines
    .assign(date=lambda dd: pd.to_datetime(dd.start_time.dt.date))
    .merge(min_max_weekly_dates)
)

grid_intersections = gpd.sjoin(min_max_trajectories, grid, op='intersects')

Trajectories per year

Trajectories per year

By comparing, cell-wise, the ratio between KD maxima and KD minima intersections, we can obtain an overview of the source areas overrepresented in association with the timings of either phenomena.

To generate a summarized figure, we use the Log2 transformation of the ratios to generate the image, and colour each cell grid according to this. The transformation defined as:

\[R = \text{Log}_2\cfrac{\text{Intersections}_{max}}{\text{Intersections}_{min}}\]

This allows for a symmetric scale on both sides, generating negative values for areas overrepresented for KD minima, and positive values for areas overrepresented in KD maxima.

(grid_intersections
 .groupby(['x', 'y', 'label'])
 .size()
 .rename('n')
 .reset_index()
 .pivot(['x', 'y'], 'label', 'n')
 .fillna(0)
 .assign(diff=lambda dd: dd['KD Maxima'] - dd['KD Minima'])
 .assign(fc=lambda dd: np.log2(dd['KD Maxima'] / dd['KD Minima']))
 .sort_values('diff')
 .reset_index()
 .merge(x_y_poly)
 .assign(total_n=lambda dd: dd['KD Maxima'] + dd['KD Minima'])
 .loc[lambda dd: dd.total_n >= 1120 * 0.01]
 .pipe(lambda dd: p9.ggplot(dd)
       + p9.geom_map(p9.aes(fill='fc', geometry='cell'), size=.0)
       + p9.geom_map(data=world, size=.1, alpha=.1)
       + p9.scale_fill_continuous('RdBu_r')
       + p9.scale_alpha_continuous(trans='log10')
       + p9.theme(dpi=300, panel_grid=p9.element_blank(),
                  plot_background=p9.element_blank(),
                  legend_background=p9.element_blank(),
                  legend_key_size=8,
                  legend_text=p9.element_text(size=6),
                  legend_title_align='center',
                  legend_title=p9.element_text(size=7))
       + p9.labs(fill='Log$_2$ FC')
       + p9.scale_x_continuous(labels=custom_format('{:0g}°E'), limits=(90, 160))
       + p9.scale_y_continuous(labels=custom_format('{:0g}°N'), limits=(20, 65)))
)

Results

The main findings support the hypothesis that winds from Northeastern Asia are associated and synchronized with periods of high KD Maxima in Tokyo, as the main figure portrays:

Main Figure

Main Figure

Red values represent those areas of more common occurence during weeks of KD maxima with respect to those of KD minima, with blue values representing the opposite.

All in all, there seem to be two main different routes associated to periods of

Limitations

Temporal association study

There are several limitations when trying to relate studies to connect temporal changes between different processes, as finding associations doesn’t necessarily imply a causal link between both phenomena, and external drivers connecting both might be the reason of the link.

Choice of temporal variability

Taking this into account, the temporal changes in KD incidence in Tokyo has three main sources of variability:

  • The increasing trend.
  • The seasonal variation (mainly, of yearly frequence).
  • The anomalies once removed of trend and seasonal variability.

The methods employed here, given the selection of the yearly maxima and minima, only allow us to associate or identify the phenomena related to yearly variation, and would fail to identify the processes that drive the general increase in cases for the last 20 years and the anomalies unrelated to the seasonal variability.

Backtrajectory modelling

The use of HYSPLIT for modelling the trajectories implies that there is a certain degree of error in the estimation of the trajectories, and the magnitude of these is known to increase as the trajectories surpass the planet boundary layer. By not using single trajectories to assess any of the effects but aggregates of many of them, we attempted to mitigate these issues. The use of higher resolution meteorology data might also allow us to improve the accuracy of the modelling.

In this study, we’ve also just considered the 2D intersections of the trajectories with the latitudinal-longitudinal plane, without regard for the actual altitude of the trajectories. To be more thorough, we should include the altitude of the trajectories in each point in time, paired with calculations of moisture uptake rates in order to able to actually pinpoint with better accuracy the areas where the air masses that arrive to the destination are collecting matter.

Conclusions and outlook

This study further confirms, with updated data, the hypothesis connecting the tropospheric winds from NE Asia with higher Kawasaki Disesase incidence periods which was discussed in Rodó et al. (2014), allowing us now to pinpoint with higher accuracy the specific locations that might source the (or one of) the etiologic agents of the disease.

Further studies should (and are being done) on the characterization of these air masses to understand the potential immune-triggering compounds carried in them.

Stay posted for more updated on the issue and feel free to comment below with any doubts or comments.

References

Ae, Ryusuke, Nobuko Makino, Koki Kosami, Masanari Kuwabara, Yuri Matsubara, and Yosikazu Nakamura. 2020. “Epidemiology, Treatments, and Cardiac Complications in Patients with Kawasaki Disease: The Nationwide Survey in Japan, 2017-2018.” The Journal of Pediatrics 225: 23–29.e2. https://doi.org/https://doi.org/10.1016/j.jpeds.2020.05.034.
Burns, Jane C., Lauren Herzog, Olivia Fabri, et al. 2013. “Seasonality of Kawasaki Disease: A Global Perspective.” PLOS ONE 8 (9): e74529. https://doi.org/10.1371/journal.pone.0074529.
Makino, Nobuko, Yosikazu Nakamura, Mayumi Yashiro, et al. 2015. “Descriptive Epidemiology of Kawasaki Disease in Japan, 2011–2012: From the Results of the 22nd Nationwide Survey.” Journal of Epidemiology, JE20140089.
Makino, Nobuko, Yosikazu Nakamura, Mayumi Yashiro, et al. 2018. “Epidemiological Observations of Kawasaki Disease in Japan, 2013–2014.” Pediatrics International 60 (6): 581–87.
Makino, Nobuko, Yosikazu Nakamura, Mayumi Yashiro, et al. 2019. “Nationwide Epidemiologic Survey of Kawasaki Disease in Japan, 2015–2016.” Pediatrics International 61 (4): 397–403.
Rodó, Xavier, Joan Ballester, Dan Cayan, et al. 2011. “Association of Kawasaki Disease with Tropospheric Wind Patterns.” Sci Rep 1 (1): 152. https://doi.org/10.1038/srep00152.
Rodó, Xavier, Roger Curcoll, Marguerite Robinson, et al. 2014. “Tropospheric Winds from Northeastern China Carry the Etiologic Agent of Kawasaki Disease from Its Source to Japan.” Proceedings of the National Academy of Sciences 111 (22): 7952–57. https://doi.org/10.1073/pnas.1400380111.
Stein, A. F., R. R. Draxler, G. D. Rolph, B. J. B. Stunder, M. D. Cohen, and F. Ngan. 2015. NOAA’s HYSPLIT Atmospheric Transport and Dispersion Modeling System.” Bulletin of the American Meteorological Society 96 (12): 2059–77. https://doi.org/10.1175/BAMS-D-14-00110.1.
Warner, Mellissa S. C. 2018. “Introduction to PySPLIT: A Python Toolkit for NOAA ARL’s HYSPLIT Model.” Computing in Science Engineering 20 (5): 47–62. https://doi.org/10.1109/MCSE.2017.3301549.

Comments

Replies are powered by GitHub Discussions.