Data Catalog

You can find the datasets currently ingested in the mimi_ws_1, a data lakehouse workspace. For most cases, we tried to preserve the original table formats with some exceptions: We transformed column names to use the Snake case naming scheme, e.g., Provider Name becomes provider_name. We added three variables to all tables:

  • mimi_src_file_name: This column shows the source file's name for the data.
  • mimi_src_file_date: This column shows the date mark of the source file. For example, the monthly NPPES data files have the date postamble in their file names. We extract the date mark of the file name and store the value in this column. The date mark can be either the creation date or the measurement cut-off date. Please read the table description for more details.
  • mimi_dlt_load_date: This column shows when the data is loaded to the lake house as a delta table.

The Catalog section of the workspace provides more details of these tables and columns. Please log in to the workspace and navigate to the Catalog menu to learn more about the datasets.

-- The first level maps to {schema}
-- and the next level maps to {table} fields.
SELECT * FROM mimi_ws_1.{schema}.{table} LIMIT 10;

See the list of schemas and tables below:

AHRQ - mimi_ws_1.ahrq

CDC - mimi_ws_1.cdc

  • Description: Datasets from the Centers for Disease Control and Prevention (CDC)
  • Tables:
    • nhanes_demo_demographic_variables_sample_weights: CDC NHANES DEMO Demographic Variables & Sample Weights
    • nhanes_exam_blood_pressure: CDC NHANES EXAM Blood Pressure
    • nhanes_exam_body_measures: CDC NHANES EXAM Body Measures
    • nhanes_lab_albumin_creatinine_urine: CDC NHANES LAB Albumin & Creatinine - Urine
    • nhanes_lab_alpha1acid_glycoprotein_serum_surplus: CDC NHANES LAB Alpha-1-Acid Glycoprotein - Serum (Surplus)
    • nhanes_lab_cholesterol_hdl: CDC NHANES LAB Cholesterol - HDL
    • nhanes_lab_cholesterol_ldl_triglycerides: CDC NHANES LAB Cholesterol - LDL & Triglycerides
    • nhanes_lab_cholesterol_total: CDC NHANES LAB Cholesterol - Total
    • nhanes_lab_fasting_questionnaire: CDC NHANES LAB Fasting Questionnaire
    • nhanes_lab_glycohemoglobin: CDC NHANES LAB Glycohemoglobin
    • nhanes_lab_glyphosate_glyp_urine: CDC NHANES LAB Glyphosate (GLYP) - Urine
    • nhanes_lab_highsensitivity_creactive_protein: CDC NHANES LAB High-Sensitivity C-Reactive Protein
    • nhanes_lab_insulin: CDC NHANES LAB Insulin
    • nhanes_lab_oral_glucose_tolerance_test: CDC NHANES LAB Oral Glucose Tolerance Test
    • nhanes_lab_plasma_fasting_glucose: CDC NHANES LAB Plasma Fasting Glucose
    • nhanes_lab_standard_biochemistry_profile: CDC NHANES LAB Standard Biochemistry Profile
    • nhanes_metadata: National Health and Nutrition Examination Survey (NHANES) Metadata
    • nhanes_qre_acculturation: CDC NHANES QRE Acculturation
    • nhanes_qre_air_quality: CDC NHANES QRE Air Quality
    • nhanes_qre_alcohol_use: CDC NHANES QRE Alcohol Use
    • nhanes_qre_blood_pressure_cholesterol: CDC NHANES QRE Blood Pressure & Cholesterol
    • nhanes_qre_bowel_health: CDC NHANES QRE Bowel Health
    • nhanes_qre_cardiovascular_health: CDC NHANES QRE Cardiovascular Health
    • nhanes_qre_diabetes: CDC NHANES QRE Diabetes
    • nhanes_qre_hospital_utilization_access_to_care: CDC NHANES QRE Hospital Utilization & Access to Care
    • nhanes_qre_income: CDC NHANES QRE Income
    • nhanes_qre_kidney_conditions: CDC NHANES QRE Kidney Conditions
    • nhanes_qre_medical_conditions: CDC NHANES QRE Medical Conditions
    • nhanes_qre_preventive_aspirin_use: CDC NHANES QRE Preventive Aspirin Use
    • nhanes_qre_smoking_adult_recent_tobacco_use_youth_cigarettetobacco_use: CDC NHANES QRE Smoking - Adult Recent Tobacco Use & Youth Cigarette/Tobacco Use
    • nhanes_qre_smoking_cigarette_use: CDC NHANES QRE Smoking - Cigarette Use
    • nhanes_qre_smoking_household_smokers: CDC NHANES QRE Smoking - Household Smokers
    • nhanes_qre_smoking_recent_tobacco_use: CDC NHANES QRE Smoking - Recent Tobacco Use
    • nndss: National Notifiable Diseases Surveillance System (NNDSS)
    • nwss_covid: National Wastewater Surveillance System (NWSS) for Covid
    • nwss_mpox: National Wastewater Surveillance System (NWSS) for Mpox
    • places_censustract: PLACES: Local Data for Better Health - censustract-level, multiyear
    • places_county: PLACES: Local Data for Better Health - county-level, multiyear
    • places_zcta: PLACES: Local Data for Better Health - zcta-level, multiyear
    • svi_censustract_multiyears: Social Vulnerability Index at Census Tract-Level - multiyear
    • svi_censustract_y2000: Social Vulnerability Index at Census Tract-Level - Year 2000
    • svi_censustract_y2010: Social Vulnerability Index at Census Tract-Level - Year 2010
    • svi_censustract_y2014: Social Vulnerability Index at Census Tract-Level - Year 2014
    • svi_censustract_y2016: Social Vulnerability Index at Census Tract-Level - Year 2016
    • svi_censustract_y2018: Social Vulnerability Index at Census Tract-Level - Year 2018
    • svi_censustract_y2020: Social Vulnerability Index at Census Tract-Level - Year 2020
    • svi_censustract_y2022: Social Vulnerability Index at County-Level - Year 2022
    • svi_county_multiyears: Social Vulnerability Index at County-Level - multiyear
    • svi_county_y2000: Social Vulnerability Index at County-Level - Year 2000
    • svi_county_y2010: Social Vulnerability Index at County-Level - Year 2010
    • svi_county_y2014: Social Vulnerability Index at County-Level - Year 2014
    • svi_county_y2016: Social Vulnerability Index at County-Level - Year 2016
    • svi_county_y2018: Social Vulnerability Index at County-Level - Year 2018
    • svi_county_y2020: Social Vulnerability Index at County-Level - Year 2020
    • svi_county_y2022: Social Vulnerability Index at County-Level - Year 2022
    • urbanrural_classification: NCHS Urban-Rural Classification Scheme for Counties
    • vsrr_drugoverdose: Vital Statistics Rapid Reporting (VSRR) for Drug Overdose

Census - mimi_ws_1.census

CMS Coding & Billing Section - mimi_ws_1.cmscoding

CMS Payment Section - mimi_ws_1.cmspayment

Data.CMS.gov - mimi_ws_1.datacmsgov

Data Commons - mimi_ws_1.datacommons

  • Description: Datasets from the Data Commons project
  • Tables:

Data.Healthcare.gov - mimi_ws_1.datahealthcaregov

  • Description: Datasets from the data.healthcare.gov site
  • Tables:
    • formulary_details: Plan Formulary Details, e.g., Drug Name, RxNorm ID - multimonth
    • mrf_xlsx: Machine Readable File (MRF) URLs - multimonth
    • plan: Plan Information, e.g., Plan Name, ID - multimonth
    • plan_formulary_base: Plan Formulary Base Information - multimonth
    • provider_addresses: Provider Addresses from the Provider Directories - multimonth
    • provider_base: Provider Directory (base/master) - multimonth
    • provider_plans: Provider to Contracted Plans from the Provider Directories - multimonth

Data.Medicaid.gov - mimi_ws_1.datamedicaidgov

CMS DE-SynPUF - mimi_ws_1.desynpuf

  • Description: CMS Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF)
  • Tables:
    • beneficiary_summary: Beneficiary Summary from 2008 to 2010
    • carrier_claims: Carrier Claims from 2008 to 2010
    • inpatient_claims: Inpatient Claims from 2008 to 2010
    • outpatient_claims: Outpatient Claims from 2008 to 2010
    • prescription_drug_events: Prescription Drug Events from 2008 to 2010

Environmental Protection Agency - mimi_ws_1.epa

FDA - mimi_ws_1.fda

  • Description: Datasets from the U.S. Food & Drug Administration (FDA)
  • Tables:
    • adverse_event_base: Drug Adverse Event - Base Table - multiquarter
    • adverse_event_drug: Drug Adverse Event - Drug Table, a part of the Drug Adverse Event Base table
    • adverse_event_reaction: Drug Adverse Event - Reaction Table, a part of the Drug Adverse Event Base table
    • enforcement: Drug Recall Enforcement - Base Table
    • enforcement_ndc_detail: Drug Recall Enforcement - Package NDC Table, a part of the Drug Recall Enforcement Base table
    • ndc_directory: NDC Directory - multiweek
    • ndc_label: Drug Package Labels - full text data
    • ndc_to_active_ingredients: NDC to Active Ingredients Mapping - a part of the NDC Directory
    • ndc_to_pharm_class: NDC to Pharmacologic Class Mapping - a part of the NDC Directory
    • ndc_to_rxcui: NDC to RxCUI Mapping - a part of the NDC Directory
    • orangebook_exclusivity: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - exclusivity info, multiweek
    • orangebook_patent: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - patent info, multiweek
    • orangebook_products: Approved Drug Products with Therapeutic Equivalence Evaluations, aka Orange Book - products, multiweek
    • purplebook: All FDA-licensed (approved) biological products regulated by the Center for Drug Evaluation and Research (CDER), aka Purple Book - multimonth

Graham Center - mimi_ws_1.grahamcenter

HealthIT - mimi_ws_1.healthit

HHS-OIG - mimi_ws_1.hhsoig

HRSA - mimi_ws_1.hrsa

HUDUser - mimi_ws_1.huduser

  • Description: Datasets from huduser.gov - a part of the Office of Policy Development and Research, PD&R
  • Tables:
    • cbsa_to_zip: Core-Based Statistical Area (CBSA) to ZIP Code Crosswalk (raw data) - multiyear
    • cbsa_to_zip_otm: CBSA to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • county_to_zip: County to ZIP Code Crosswalk (raw data) - multiyear
    • county_to_zip_otm: County to ZIP crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest
    • tract_to_zip: Census Tract to ZIP Code Crosswalk (raw data) - multiyear
    • tract_to_zip_mto: Census Tract to ZIP crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_cbsa: ZIP Code to Core-Based Statistical Area (CBSA) (raw data) - multiyear
    • zip_to_cbsa_mto: ZIP to CBSA crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_county: ZIP Code to County Crosswalk (raw data) - multiyear
    • zip_to_county_mto: ZIP to County crosswalk, many-to-one (mto) mapping based on the residential size - derived, latest
    • zip_to_tract: ZIP Code to Census Tract Crosswalk (raw data) - multiyear
    • zip_to_tract_otm: ZIP to Census Tract crosswalk, one-to-many (otm) mapping based on the residential size - derived, latest

MedlinePlus - mimi_ws_1.medlineplus

  • Description: Datasets from MedlinePlus - knowledge base, XML
  • Tables:

NBER - mimi_ws_1.nber

Neighborhood Atlas - mimi_ws_1.neighborhoodatlas

  • Description: Datasets from the Neighborhood Atlas - Area Deprivation Index
  • Tables:
    • adi_censusblock: Area Deprivation Index (ADI) Original (Census Block Group Level)
    • adi_censustract: Area Deprivation Index (ADI) Aggregated (Census Tract Level, USE WITH CAUTION)
    • adi_county: Area Deprivation Index (ADI) Aggregated (Census Block Group Level, USE WITH CAUTION)

NPPES - mimi_ws_1.nppes

  • Description: Datasets from NPPES (National Plan and Provider Enumeration System)
  • Tables:
    • address_census_geocoder_dedup: De-duplicated Geocoding Results for the address_key table
    • address_census_geocoder_raw: Raw Geocoding Results from the US Census Geocoder, derived from the `address_key``
    • address_key: Address Key, A collection of Unique Address Strings for all providers and times
    • deactivated: Deactived Provider List
    • endpoint: Health Information Exchange (HIE) Endpoints, i.e., provider HIE contact address - multiyear
    • endpoint_se: HIE Endpoints formatted with Start and End dates
    • license_se: Provider License Data with Start and End dates, derived from npidata
    • mongodb_export: Data Extract for npi-db.org (a demo project)
    • npi_to_address: NPI to Geocoded Address (both practice and mail address), only for the latest npidata batch
    • npidata: NPIDATA - the base NPI directory, multiyear
    • openpayments: Open Payment Summaries for npi-db.org, derived from the openpayments schema
    • otherid_ccn_se: CCNs (CMS Certification Number, often used for facilities) with Start and End dates
    • otherid_se: Other Provider IDs with Start and End dates, derived from npidata
    • othername: Other Business Names such as DBA - multiyear
    • othername_se: Other Business Names formatted with Start and End dates
    • pl: Other Practice Locations - multiyear
    • pl_se: Other Practice Locations formatted with Start and End dates
    • taxonomy_se: Provider Taxonomies with Start and End dates

Open Payments - mimi_ws_1.openpayments

Palmetto GBA - mimi_ws_1.palmettogba

Part C/D - mimi_ws_1.partcd

Payer MRF - mimi_ws_1.payermrf

Prescription Drug Plan - mimi_ws_1.prescriptiondrugplan

  • Description: Datasets from the Part-D Formularies and Networks section - a subsection of the data.cms.gov site
  • Tables:
    • basic_drugs_formulary: Basic Drugs Formulary File - multiquarter
    • beneficiary_cost: Beneficiary Cost File - multiquarter
    • excluded_drugs_formulary: Excluded Drugs Formulary File - multiquarter
    • geographic_locator: Geographic Locator File - multiquarter
    • indication_based_coverage_formulary: Indication Based Coverage (IBC) Formulary File - multiquarter
    • insulin_beneficiary_cost: Insulin Beneficiary Cost File - multiquarter
    • partial_gap_coverage: Partial Gap File - multiquarter
    • pharmacy_networks: Pharmacy Networks File - multiquarter
    • plan_information: Plan Information File - multiquarter
    • pricing: Pricing File - multiquarter

Provider Data Catalog - mimi_ws_1.provdatacatalog

State Government Databases - mimi_ws_1.stategov

Surgo Ventures - mimi_ws_1.surgoventures

CMS Synthetic Medicare PUF - mimi_ws_1.synmedpuf

  • Description: CMS Synthetic Medicare PUF
  • Tables:
    • beneficiary: Beneficiary Summary
    • carrier: Carrier Claims
    • dme: Durable Medical Equipment Claims
    • hha: Home Health Agency Claims
    • hospice: Hospice Claims
    • inpatient: Inpatient Claims
    • outpatient: Outpatient Claims
    • pde: Prescription Drug Events from 2008 to 2010
    • snf: Skilled Nursing Facility Claims

Synthea - mimi_ws_1.synthea

  • Description: Datasets from the MITRE Synthea project - 1.1M synthetic patients
  • Tables:
    • allergies: Patient allergy data.
    • careplans: Patient care plan data, including goals.
    • conditions: Patient conditions or diagnoses.
    • devices: Patient-affixed permanent and semi-permanent devices.
    • encounters: Patient encounter data.
    • imaging_studies: Patient imaging metadata.
    • immunizations: Patient immunization data.
    • medications: Patient medication data.
    • observations: Patient observations including vital signs and lab reports.
    • organizations: Provider organizations including hospitals.
    • patients: Patient demographic data.
    • payer_transitions: Payer Transition data (i.e. changes in health insurance).
    • payers: Payer organization data.
    • procedures: Patient procedure data including surgeries.
    • providers: Clinicians that provide patient care.
    • supplies: Supplies used in the provision of care.

Zillow - mimi_ws_1.zillow

  • Description: Datasets from Zillow (a real-estate marketplace company)
  • Tables:
    • homevalue_zip: Home Values by Zillow - time-series
    • rent_zip: Rentals by Zillow - time-series