How to integrate data into Databricks with Windsor.ai

What is Databricks?

Databricks is a cloud-based data analytics platform built on the Apache Spark engine, designed to process large-scale data efficiently. It provides a collaborative environment for data engineers, scientists, and analysts to build complex machine learning models and perform real-time analytics with a fully managed infrastructure.

Databricks’ key advantages include seamless handling of large datasets, cloud scalability, and integration with AWS, Azure, and Google Cloud services. Windsor.ai supports them all as well. The platform enhances data processing with Delta Lake for reliability and performance, supports multiple programming languages, and offers built-in AI and ML tools. By bringing automated cluster management, cost optimization, and strong security, Databricks levels up data analysis workflows. 

By integrating Datarbicks with the Windsor.ai data movement platform, you can:

  • Automatically extract data from multiple sources and connect it to Databricks, enabling advanced big data processing and AI-driven insights.
  • Streamline data ingestion, transformation, and analysis, reducing manual work and ensuring real-time updates for informed decision-making.
  • Leverage Databricks’ cloud scalability to efficiently process large volumes of data while optimizing costs with automated resource management.

Explore our step-by-step guide to seamlessly integrate your data into Databricks with the Windsor.ai ELT connector.

Connecting data in Windsor.ai

1. Create a Windsor.ai account and log in.

2. Select the data source from which you want to stream data, e.g., Google Analytics 4 (GA4). Sign in with your associated Google account and select the next step, “Data preview.”

selecting data source in windsor.ai

3. You’ll see your Google Analytics 4 data displayed in your Windsor.ai account. 

Now, let’s proceed with setting up the Databricks environment for data integration.

Configuring Databricks

1. First of all, make sure you have an active Databricks Developer account. Go to Databricks and log in to your developer account.

2. In the sidebar, select Catalog, then click the “+” icon and choose “Create Catalog.”

Create Catalog in Databricks

3. Enter the Catalog name (it can be anything you wish, but it should contain only ASCII letters (‘a’ – ‘z ‘, ‘A’ – ‘Z’), digits (‘0’ – ‘9’), and underbar (‘_’)) and click “Create.” 

catalog name in databricks

4. Go to your newly created catalog and click “Create Schema.” Enter the Schema Name (anything you want) and click “Create.”

create a schema in databricks

5. Now get the Server Hostname and HTTP Path.

Find SQL Warehouses in the sidebar, select the Connection Details tab, and copy the Server Hostname and HTTP Path.

Server Hostname and HTTP Path in Databricks

6. You also need to get the Catalog and Schema

Find Catalog in the sidebar and copy Catalog and Schema names.

Catalog and Schema in Databricks

Here we go, you’ve set up the Databricks catalog and schema on the Databricks console; it’s time to gather the required credentials to authorize the connection between Databricks and Windsor.ai. 

Define and set your authentication method

To connect Databricks with Windsor.ai, you can choose between two authentication methods: a Personal Access Token or OAuth 2.0.

💡 Use Personal Access Token if you’re in a dev/test environment. Use OAuth 2.0 if your organization enforces secure login.

The authorization flow will vary depending on which method you select.

Method 1. Via a Personal Access Token

To get the Personal Access Token in Databricks:

1. In the top right corner, click on your account and select Settings

databricks settings

2. Find the Developer section in the sidebar and click “Manage in the Access Tokens row.

manage Access Tokens in Databricks

3. Click “Generate New Token,” enter a Comment (anything you wish) for your token, and finish with the “Generate” button. 

get new access token in databricks

4. Copy the created access token for future use in Windsor.ai.

copy access token in databricks

Method 2. Via OAuth 2.0

Use this method if your Databricks workspace is configured with OAuth 2.0 authentication (available only for paid accounts). 

1. Go here and click on the “Add connection” button.

2. Add https://onboard.windsor.ai/integrations/databricks/callback to “

3. Save “” and “Client Secret” values.

4. You can find the “Workspace Host” value in the Workspace section here.

Now, let’s import your data from Windsor.ai into the created Databricks catalog table using the preferred auth method.

Sending Windsor.ai data to Databricks

1. Return to your Windsor.ai account and move to the data preview page. Scroll down to data destinations, select Databricks, and click “Add Destination Task.”

Databricks destination in windsor.ai

2. Enter all the required credentials: 

  • Task name (you can provide any based on the data integration purpose).
  • Authentication type: via personal access token or OAuth 2.0 (make sure to provide the appropriate credentials based on the chosen method—these should be retrieved from your Databricks account beforehand.)
  • Server hostname, HTTP path, catalog, table, and schema you got from your Databricks developer console.

destination task databricks windsor

Click “Test Connection.”

If the connection is set properly, you’ll see a success message at the bottom; otherwise, an error message will appear. When successful, click “Save. The data stream to the Databricks table has started.

3. You can now see the task running in the selected data destination section. The green ‘upload’ button with the status ‘ok’ indicates that the task is active and running successfully.

successful data integration into databricks

4. Verify that your data is being added to the Databricks table. Go to your Databricks catalog and select the relevant table.

integrating data into databricks

Cheers! Your data is now integrated into Databricks and ready for detailed analysis.

FAQs

What are the key steps to connect Windsor.ai with Databricks?

To sync Windsor.ai data with Databricks, start by connecting a data source in Windsor.ai. In parallel, set up the Databricks catalog and schema in the Databricks developer console. Next, choose Databricks as the data destination in Windsor.ai and enter the required credentials. Test the connection to ensure it’s set up correctly and save the configuration. Once completed, Windsor.ai will start streaming data seamlessly to your Databricks table.

Tired of manually transferring data to Databricks? Try Windsor.ai today to automate the process

Access all your data from various sources in one place. Get started for free with a 30-day trial.
g logo
fb logo
big query data
youtube logo
power logo
looker logo