Skip to main content

How to Onboard a Source

Monarch includes a source onboarding agent to quickly configure a source and generate a starter data mapping. Once run, it leaves draft outputs on disk for human review (i.e. it does not auto-promote the generated files into the live product). After review, the developer can move the files into the correct project folders.

Prerequisites

  1. A valid key/token for Claude Code or Codex
  2. Ensure the proper environment variables are set before starting up the agent docker containers

Currently, the Monarch agent supports the following backends:

  • Claude Code
  • Codex

And the following auth paths:

  • Bedrock env such as:
    • CLAUDE_CODE_USE_BEDROCK
    • AWS_BEARER_TOKEN_BEDROCK
    • AWS_REGION
    • or AWS access key material
  • ANTHROPIC_API_KEY
  • OPENAI_API_KEY

Claude via Bedrock

Use this environment configuration when you want Claude with Bedrock:

unset ANTHROPIC_API_KEY

export CLAUDE_CODE_USE_BEDROCK=1
export AWS_BEARER_TOKEN_BEDROCK='REPLACE_ME'
export AWS_REGION='YOUR_REGION'

unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN

Claude direct via Anthropic

Use this environment configuration when you want Claude without Bedrock:

export ANTHROPIC_API_KEY='REPLACE_ME'

unset CLAUDE_CODE_USE_BEDROCK
unset AWS_BEARER_TOKEN_BEDROCK
unset AWS_REGION
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN

Codex

Use this environment configuration when you want to use Codex:

export OPENAI_API_KEY='REPLACE_ME'

Process

  1. Create a schema to house the raw data and load that data into tables
  2. Run DAG 01_onboard_source, providing the schema from step one as the "Source Schema" parameter

alt text

  1. During the run, the logs will show a run_id. This id refers to the iteration of the agent run and all outputs of the run will live in the custom/source-onboarding-runs/{run_id} directory.
    • Note: This run can take anywhere from 20-45 minutes. The DAG will keep polling until the run completes.
  2. Although the agent will run the evaluation as part of its process, the DAG runs the evaulation again to output the result in airflow.
  3. Once the DAG run is complete, visit the src/custom/source-onboarding-runs/{run_id}/onboarding directory to view configs, copy_scripts, the transformation dbt project, and additional notes. Additionally, the agent will run dbt transform as part of its process so you can view the resulting tables directly in the database.
  4. Review the files and copy them into their corresponding folders. At this point the onboarding is complete and you can proceed with other Monarch steps.

Cancelling a Run

  1. Make a note of the run_id from the active run. You can find it in the logs of the start_agent_run or poll_agent_run task
  2. Mark the poll_agent_run task as failed to stop the DAG
  3. Trigger DAG 01_onboard_source again, but this time enable the "Cancel" flag and fill out the "Run ID"
  4. The DAG will cancel the run

alt text

Further Reading

For more in-depth information, see the operator_docs in the codebase.