Projects
Please contact us for any new beautiful small project ideas that need our attention. We are here to help.
We believe that, with the advancement of AI and modularized build-ups, such small projects can have beautiful impacts.
Demo Projects
- https://npi-db.org/: A simple NPI database search engine to demonstrate the list of publicly available datasets in our workspace.
Active Projects
Creating a comprehensive provider registry using public data sources
Accurate provider information is difficult to come by. However, there are many different sources in the public data space. So far, all these data sources are treated separately. Combining them together, we think some interesting insights about the provider would come out, as well as more accurate data will surface.
DRGPy - MSDRG in Python
This is a Python implementation of the MS-DRG algorithm, Medicare Prospective Payment System, which is originally available via Java. The project mimics the behavior of the Java implementation (not perfect, unfortunately) and is open-sourced at https://github.com/yubin-park/drgpy
hccpy - HCC in Python
I implemented the HCC algorithm (initially released in SAS) in Python. The project has been widely adopted in various VC-backed healthcare start-ups and even big enterprises. https://github.com/yubin-park/hccpy
Autoscalable PDF generator
The healthcare industry still needs a lot of PDF-formatted documents; however generating such documents from raw data relies on outdated technologies. Using Typst (https://typst.app/), we can create a fast and light PDF generator API service.
Project Ideas (not implemented yet)
Make a Python version of the OIG risk audit toolkit
The risk of over-coding is getting higher. The Office of Inspector General (OIG) has been increasing the audit intensity over the years. Recently, the OIG publicly released their audit methodology on 12/14/2023. Although the methodology is well documented in the PDF file, the audit algorithm is not easy to apply to the real data. We plan to implement the algorithm in Python as we did for the HCC algorithm, so that many healthcare data engineers can try the logic and prevent any overdocumentation in the future.
Public alert system using Twitter, health blogs, and weather data
Twitter and weather data are real-time and provide valuable insights to avoid catastrophic events. For example, Mayo Clinic tweets patient education messages many times a day. As extermee weather events become more prevalent, we want to create an alert system that curates from these patient education materials and real-time weather forecast data.
hccpy with FHIR - hccfier
We want to revamp the hccpy project (https://github.com/yubin-park/hccpy) to work with the FHIR resources. We also want to make this package avaiable in Javascript as well.
Indexing CMS PDF documents
CMS/CMMI publish a lot of PDF-formatted documents; they are difficult to search and find relevant contents. With the help from LLMs, we want to index and build machine-readable database of CMS documents. We would also need to build a web crawler that constantly checks and parses the CMS websites.
Ingesting and cleaning up public MAO datasets
CMS publishes many public datasets around MAO's Star Rating performances, enrollments, and other KPIs. We want to organize the data such that other organizations do not need to do this repetitive work. We want to provide the cleaned up data via Databricks Delta Sharing.