Flatten a JSON Object Quickly using json_normalize

December 14, 2023

While building out my data lake, one helpful function I took advantage of was json_normalize. This helped me flatten out the responses from various API endpoints while creating my raw tables. Helping preserve the original object and key names when blending and processing the data down the line.

The data I’m working has roughly this structure, there are some further nested values that I don’t need to break out at this phase, and so I’ll let json_normalize pass those objects as strings. But in general you can see what I’m trying to accomplish.

{'sports':[{'leagues':'id','uid','teams':[{...}]}]

The snippet below shows how I’ve been able to take my JSON response in ‘leagueData’ above selecting ‘leagues’, then flatten it by the keys within the ‘teams’ object while retaining the parent level values via the ‘meta’ parameter.

df = pd.json_normalize(data=leagueData['leagues'], record_path='teams', meta=['id','uid','name','abbreviation','shortName','slug'])
df.columns = df.columns.str.replace('.', '_')

The second line replaces the dot notation with underscores for readability.

You can break out the further nested objects into another table, then join the all the tables later to fully flatten the entire response at once.

I won’t go into further detail, there are a lot of resources and other options available, but there will always be some sort of customization required.