Finding Data with AI
While we're currently updating our videos to reflect mimiLabs' recent advancements, the fundamental principles remain relevant. We appreciate your patience as we work to bring you the most up-to-date content.
In this section, we will discuss how to get started with the mimilabs data lakehouse, and find right datasets for your qeustion with the help of AI.
You can watch the videos below to learn more about finding data with AI.
Getting Started
Hey there, builders at Mimilabs! I have some exciting news to share with you. We now have access to our very own workspace, Mimilabs workspace number one. To get started, all you need is the workspace URL, email address, and password. Just click the sign-in button, and you'll be taken to the first screen with menus on the left and top. Don't worry, they may look overwhelming, but they're actually quite simple. I'll guide you through each one step by step. Get ready for an amazing journey!
Exploring Catalogs, Schemas, and Tables
In this video, I will show you how to navigate the catalog menu in Databricks. The catalog is a collection of databases or schemas, and it provides access to various federal agencies and private company databases. As a new user, you will have access to two databases, but as you explore the workspace and create codes, you will have access to more. I will start by demonstrating the NPPES database and explaining the schema and table descriptions. No action is required from you at this time.
Writing SQL for the First Time!
In this video, I dive into the npidata
table in the NPPES schema and show you how to write a query to explore the data.
I provide a step-by-step guide on navigating the catalog menu, accessing the SQL editor, and running queries.
I also mention that it may take some time for the compute clusters to warm up for the first query.
No action is requested from the viewers, but the video provides valuable information for data exploration.
Resources
Data Engineering
Learn about how we downloaded and ingested thousands of public datasets into our data lakehouse.