How to integrate data into Databricks with Windsor.ai
What is Databricks?
Databricks is a cloud-based data analytics platform built on the Apache Spark engine, designed to process large-scale data efficiently. It provides a collaborative environment for data engineers, scientists, and analysts to build complex machine learning models and perform real-time analytics with a fully managed infrastructure.
Databricks’ key advantages include seamless handling of large datasets, cloud scalability, and integration with AWS, Azure, and Google Cloud services. The platform enhances data processing with Delta Lake for reliability and performance, supports multiple programming languages, and offers built-in AI and ML tools. By bringing automated cluster management, cost optimization, and strong security, Databricks levels up data analysis workflows.
By integrating Datarbicks with the Windsor.ai data movement platform, you can:
- Automatically extract data from multiple sources and connect it to Databricks, enabling advanced big data processing and AI-driven insights.
- Streamline data ingestion, transformation, and analysis, reducing manual work and ensuring real-time updates for informed decision-making.
- Leverage Databricks’ cloud scalability to efficiently process large volumes of data while optimizing costs with automated resource management.
Explore our step-by-step guide to seamlessly integrate your data into Databricks with the Windsor.ai ELT connector.
How to connect Databricks to Windsor.ai
Connecting data in Windsor.ai
1. Create a Windsor.ai account and log in.
2. Select the data source which you want to stream data from, e.g., Google Analytics 4 (GA4). Sign in with your associated Google account and select the next step, “Data preview.”
3. You’ll see your Google Analytics 4 data displayed in your Windsor.ai account.
Now, let’s proceed with setting up the Databricks environment for data integration.
Configuring Databricks
1. First of all, make sure you have an active Databricks Developer account. Go to Databricks and log in to your developer account.
2. In the sidebar, select Catalog, then click the “+” icon and choose “Create Catalog.”
3. Enter the Catalog name (it can be anything you wish, but it should contain only ASCII letters (‘a’ – ‘z,’ ‘A’ – ‘Z’), digits (‘0’ – ‘9’), and underbar (‘_’)) and click “Create.”
4. Go to your newly created catalog and click “Create Schema.” Enter the Schema Name (anything you want) and click “Create.”
5. Get the required fields from Databricks to create the connection between Databricks and Windsor.ai.
First, get the Access Token:
- On the top right corner, click on your account and select Settings.
- Find the Developer section in the sidebar and click “Manage” in the Access Tokens row.
- Click “Generate New Token,” enter a Comment (anything you wish) for your token, and finish with the “Generate” button.
- Copy the created access token.
6. Now get the Server Hostname and HTTP Path.
Find SQL Warehouses in the sidebar, select the Connection Details tab, and copy the Server Hostname and HTTP Path.
7. You also need to get the Catalog and Schema.
Find Catalog in the sidebar and copy Catalog and Schema names.
Here we go, you’ve set up the Databrick catalog and schema on the Databricks console and have gathered the required credentials.
Now, let’s import your data from Windsor.ai into the created Databricks catalog table.
Sending Windsor.ai data to Databricks
1. Go to your Windsor.ai account and move to the Google Analytics 4 data preview page. Scroll down to data destinations, select Databricks, and click “Add Destination Task.”
2. Enter all the required credentials:
- Task name (you can provide any based on the data integration purpose).
- Access token, server hostname, HTTP path, catalog and schema you got from Databricks developer console.
- Table name (you can provide any based on the data integration purpose, or it will be automatically created in your catalog schema by Windsor.ai). If you already have a table for your Google Analytics data, you can enter that table name.
Click “Test Connection.”
If the connection is set properly, you’ll see a success message at the bottom; otherwise, an error message will appear. When successful, click “Save” in the lower right corner of the form. The data stream to the Databricks table has started.
3. You can now see the task running in the selected data destination section. The green ‘upload’ button with the status ‘ok’ indicates that the task is active and running successfully.
4. Verify that your data is being added to the Databricks table. Go to your Databricks catalog and select the relevant table.
Cheers! Your Google Analytics 4 data is now integrated into the Databricks and ready for detailed analysis.
FAQs
What are the prerequisites for connecting Windsor.ai to Databricks?
You need an active Databricks Developer account and Windsor.ai account and the necessary access credentials from Databricks, such as an Access Token, Server Hostname, HTTP Path, Catalog, and Schema.
What is the correct way to name a Databricks catalog or schema?
The names should contain only ASCII letters (a-z, A-Z), digits (0-9), or underbars (_) and should not start with a digit.
Can I schedule automated reports from Windsor.ai to Databricks?
Yes, while connecting Windsor.ai to Databricks, you can schedule automated data streams by specifying the schedule type and time.
Where can I see my imported data in Databricks?
Navigate to your Databricks Catalog → Schema → Table, where you can preview the data imported from Windsor.ai.
What credentials do I need to connect Databricks with Windsor.ai?
You’ll need to provide the information about the catalog, schema, hostname, access token, and HTTP path.
What should I do if the connection test between Windsor.ai and Databricks fails?
Make sure you have the following things in place:
- The correct server hostname, access token, and HTTP path are provided.
- The catalog and schema are already present.
- windsor.ai doesn’t show any errors while running the task.
How can I ensure the security of my data when using Windsor.ai?
Windsor.ai uses secure OAuth authentication and encrypted data transfer protocols. To limit access to sensitive data, configure your Databricks user roles and privileges.
Can I integrate additional data sources with Databricks via Windsor.ai?
Yes, Windsor.ai supports 315+ data sources, including Facebook Ads, Google Analytics, Salesforce, and other popular platforms. You can connect any data source to Databricks by following a similar procedure for effective cross-channel analysis.
Tired of manually transferring data to Databricks? Try Windsor.ai today to automate the process

