Dagster with Spotify Ads API

I work at a marketing agency and reporting from janky marketing tools can be quite the challenge . Spotify Ads is one of those sources that previously required manual csv exports and formatting to report to the client. This process sucked up time that could be spent providing analysis or other higher level value add work for the client.

As a data practitioner, I have done my fair share of Python based data ingestion from REST APIs into a data source (S3, BigQuery, Postgres). Depending on the scenario, I would run those scripts on a cloud infrastructure or that laptop in the corner of the office. Data Ingestion feels like one of those things that should be a solved problem, but until then we must use a service like Fivetran or build our own pipeline. For this task I wanted to experiment with Dagster as a data orchestration tool. What’s cool about working with Data tooling the past few years is how good they have gotten so you can focus less on technical problems and more on the people/process problems we know and love.

We utilize BigQuery as our data warehouse at work and dbt as our ELT tool so I used Dagster’s integrations for those. The repo for this project can be found here.

Spotify Ads API Data Ingestion

Spotify Ads API has better documentation than some APIs which saved a ton of time. My workflow when integrating an API is as follows:

  • Auth/Re-Auth
  • Get Dimension data
  • Get Fact Data 
  • Load to Target
  • Initial Load
  • Incremental Refresh

After looking at the schema and gathering our client requirements. I knew the Data assets we wanted to build were accounts, campaigns, ads, full ad stats history, and incremental ad stats history (past two months to capture any adjustments). I then used these entities as the basis for my Dagster assets using Python (Code can be found here). Ideally, this code will save someone a bunch of time.  

Note: If you get server errors after you successfully authenticate when sending test requests you need to complete the API waiver form and wait an hour.

Bigquery I/O Manager.

Since we use BigQuery for our data warehouse I used Dagster’s BigQuery integration to read/write pandas dataframe objects to BigQuery tables. Using one of these will dramatically simplify the code as you don’t need to write read and write statements. 

Note: For the Dagster BigQuery Credentials, they require a Google Cloud Account Authentication credential. You can acquire this from this link

dbt in Dagster

dbt is one of those tools that once I started using I became immediately hooked. Dagsters dbt integration was easy to get integrated and going. Being able to have dbt cli embedded into your data tooling is an added element of stickiness for me personally. Once I got the configuration setup it was really fast to build models and reference Dagster assets. I just built some basic tables for reporting. But this sets us up for future complexity. dbt tests are also cool and easy to do with Dagster. Reminds me of this project.

Note: To make the Dagster python assets upstream from the dbt assets you can assign them as sources like this file, and reference them in your models like this {{ source(‘spotify_ads’, ‘accounts’) }}.

Final Thoughts

The development experience was pretty slick, and the I’m going to continue to use Dagster for our Orchestration needs. The built in visibility, rich Metadata, and scalability is quite the bonus. Developing that out internally wouldn’t add much value so you can focus on building the assets, jobs and models for effective delivery.

Leave a comment