Skip to main content

Connecting Databricks to AccountAim

AccountAim's integration with Databricks allows you to directly connect to your Databricks workspace, enabling you to leverage your existing data infrastructure and bring together all of your GTM data in one place.

In this guide, you'll learn how to connect Databricks to AccountAim.


Integration Overview

AccountAim's integration with Databricks connects directly to your Databricks workspace via JDBC/ODBC connection or Databricks SQL API. This integration allows you to:

  • Query data directly from your Databricks tables and views
  • Sync specific tables, views, or schemas (databases) to AccountAim
  • Keep your data synchronized on a schedule you control
  • Leverage AccountAim's analytics capabilities on top of your Databricks data

Only a single AccountAim Admin needs to configure the connection in order to set up the integration.

Prerequisites

Before connecting Databricks to AccountAim, ensure you have:

  • Databricks Workspace: Access to a Databricks workspace with appropriate permissions
  • SQL Warehouse or Cluster: A SQL warehouse (serverless or pro) or a running cluster that can be used for queries
  • User Credentials or Personal Access Token: A Databricks user account with read permissions, or a personal access token
  • Network Access: Ensure your Databricks workspace allows connections from AccountAim's IP addresses (if IP access lists are enabled)

Required Permissions

The Databricks user account or service principal used for the connection should have the following permissions:

  • CAN USE permission on the SQL warehouse or cluster
  • SELECT permission on the catalogs, schemas (databases), and tables you want to access
  • USE CATALOG permission on the catalogs you want to access
  • USE SCHEMA permission on the schemas you want to access

For Unity Catalog workspaces:

  • SELECT privilege on the tables/views you want to sync
  • USE CATALOG privilege on the catalogs you want to access
  • USE SCHEMA privilege on the schemas you want to access
tip

For security best practices, consider creating a dedicated Databricks service principal or user specifically for AccountAim with only the minimum required permissions.

Connection Parameters

When setting up the Databricks connection, you'll need to provide:

  • Workspace URL: Your Databricks workspace URL (e.g., https://[workspace].cloud.databricks.com)
  • HTTP Path: The HTTP path for your SQL warehouse or cluster
  • Authentication Method: Choose between Personal Access Token or Username/Password
  • Personal Access Token (if using token): Your Databricks personal access token
  • Username (if using username/password): Your Databricks username
  • Password (if using username/password): Your Databricks password
  • Catalog (optional): The default catalog to connect to (for Unity Catalog)
  • Schema/Database (optional): The default schema/database to use

How to connect Databricks

To connect Databricks to your AccountAim workspace, navigate to the Warehouse section and click "Add New" to add a new data source.

Step 1: Set up Databricks Access

Before connecting in AccountAim, ensure you have the necessary access:

  1. Create a Personal Access Token (Recommended):

    • Go to your Databricks workspace
    • Click on your user icon → User Settings → Access Tokens
    • Click "Generate New Token"
    • Enter a comment (e.g., "AccountAim Integration")
    • Set an expiration date (or leave blank for no expiration)
    • Click "Generate"
    • Copy the token immediately (you won't be able to see it again)
  2. Or Prepare Username/Password:

    • Ensure you have a Databricks user account with appropriate permissions
    • Note your username and password
  3. Get SQL Warehouse HTTP Path:

    • Go to SQL → SQL Warehouses in your Databricks workspace
    • Click on the SQL warehouse you want to use
    • Copy the "Connection details" → "JDBC/ODBC" → "Server hostname" and "HTTP path"
    • Or use the HTTP path format: /sql/1.0/warehouses/[warehouse-id]

Step 2: Select Databricks as your Source

In the "Add New Record Source" modal, select "Databricks" as your data source type.

Step 3: Choose Authentication Method

Select your preferred authentication method:

  • Personal Access Token (Recommended): Use a Databricks personal access token
  • Username/Password: Use Databricks username and password

Step 4: Enter Connection Details

If using Personal Access Token:

  1. Workspace URL: Enter your Databricks workspace URL

    • Format: https://[workspace].cloud.databricks.com
    • Include the protocol (https://) but not a trailing slash
  2. HTTP Path: Enter the HTTP path for your SQL warehouse

    • Format: /sql/1.0/warehouses/[warehouse-id]
    • You can find this in SQL → SQL Warehouses → Connection details
  3. Personal Access Token: Enter your Databricks personal access token

    • Keep this secure and never share it publicly
  4. Catalog (optional): Enter the default catalog name (for Unity Catalog workspaces)

  5. Schema/Database (optional): Enter the default schema/database name

If using Username/Password:

  1. Workspace URL: Enter your Databricks workspace URL

  2. HTTP Path: Enter the HTTP path for your SQL warehouse

  3. Username: Enter your Databricks username

  4. Password: Enter your Databricks password

  5. Catalog (optional): Enter the default catalog name

  6. Schema/Database (optional): Enter the default schema/database name

Step 5: Test Connection

Click "Test Connection" to verify that AccountAim can successfully connect to your Databricks workspace.

tip

If the connection test fails, verify:

  • Your workspace URL is correct
  • Your HTTP path is correct and the SQL warehouse is running
  • Your personal access token is valid and not expired
  • Your username/password credentials are correct
  • The SQL warehouse or cluster is not suspended
  • Network access is properly configured (if IP access lists are enabled)
  • You have the required permissions on the catalogs, schemas, and tables

Step 6: Configure Sync Settings

After a successful connection, configure your sync settings:

  • Sync Frequency: Choose how often AccountAim should sync data from Databricks
    • Options: Manual, Hourly, Daily, Weekly
  • Objects to Sync: Select which tables, views, or schemas you want to sync
  • Sync Mode: Choose between full sync or incremental sync (if supported)
  • Query Timeout: Set a timeout for queries (optional, defaults to 5 minutes)
  • Catalog Selection: If using Unity Catalog, select which catalogs to sync from

Step 7: Complete Setup

Click "Save" to complete the Databricks connection setup. AccountAim will perform an initial sync to import your selected data.


Sync Schedule

Customers have the ability to set the sync frequency in AccountAim to control data freshness and compute costs. By default, automatic syncing is enabled with a daily schedule.

Manual syncing can be enabled by navigating to the Settings page in the AccountAim app and adjusting the sync schedule for your Databricks connection.

tip

Consider your Databricks SQL warehouse usage and costs when setting sync frequency. More frequent syncs will result in more compute time and higher costs. SQL warehouses are billed based on compute time, so optimize your sync schedule accordingly.

Supported Objects

AccountAim can connect to and sync data from:

  • Tables: All Databricks tables (managed and external)
  • Views: All Databricks views
  • Schemas/Databases: Entire schemas can be selected for syncing
  • Catalogs: Unity Catalog catalogs (for workspaces using Unity Catalog)
  • Delta Tables: Support for Delta Lake tables with automatic optimization
  • Parquet Tables: Support for Parquet format tables
  • External Tables: Support for external tables pointing to cloud storage

Best Practices

  1. Use Personal Access Tokens: Personal access tokens are more secure and manageable than username/password authentication
  2. Use SQL Warehouses: SQL warehouses (serverless or pro) are recommended over clusters for SQL queries
  3. Minimize Permissions: Grant only the minimum required permissions to your Databricks user or service principal
  4. Selective Syncing: Only sync the tables and schemas you actually need in AccountAim
  5. Monitor Costs: Keep an eye on your Databricks compute costs and adjust sync frequency as needed
  6. Use Filters: Configure table filters to only sync relevant data (e.g., date ranges)
  7. Optimize Queries: For large tables, use incremental syncs or date filters to minimize data processing
  8. Unity Catalog: If using Unity Catalog, ensure proper catalog and schema permissions are configured
  9. Network Security: If using IP access lists, ensure AccountAim's IP addresses are allowed
  10. Token Management: Rotate personal access tokens regularly and use tokens with appropriate expiration dates

Troubleshooting

Connection Issues

If you're experiencing connection problems:

  • Verify your workspace URL is correct (check for typos)
  • Check that your HTTP path is correct and matches your SQL warehouse
  • Ensure your personal access token is valid and not expired
  • Verify your username/password credentials are correct
  • Check that the SQL warehouse is running and not suspended
  • Ensure network access is properly configured if IP access lists are enabled
  • Verify you have the required permissions on the catalogs, schemas, and tables

Sync Issues

If data isn't syncing properly:

  • Verify the user has SELECT permissions on the objects you're trying to sync
  • Check that the tables/schemas exist and are accessible
  • Review sync logs in AccountAim for specific error messages
  • Ensure you have sufficient Databricks compute credits or capacity
  • Check for any data type compatibility issues
  • Verify Unity Catalog permissions if using Unity Catalog

Cost Optimization

To reduce Databricks costs:

  • Use SQL warehouses instead of clusters for SQL queries (more cost-effective)
  • Reduce sync frequency for large tables
  • Use table filters to limit the amount of data processed
  • For Delta tables, use incremental syncs to only process new data
  • Consider using serverless SQL warehouses for intermittent workloads
  • Monitor your SQL warehouse usage and auto-stop settings

Performance Issues

If queries are slow or timing out:

  • Ensure your SQL warehouse is appropriately sized for your data volume
  • Use date filters or incremental syncs for large tables
  • Consider partitioning large tables in Databricks
  • Review query execution plans in Databricks
  • Increase query timeout settings if needed
  • Use cluster caching for frequently accessed data

For additional support, contact AccountAim support.