mimilabs

Thu Mar 07 2024idea

Indexing CMS PDF documents

CMS/CMMI publish a lot of PDF-formatted documents; they are difficult to search and find relevant contents. With the help from LLMs, we want to index and build machine-readable database of CMS documents. We would also need to build a web crawler that constantly checks and parses the CMS websites.

Git Metadata:

References:

Download PDF (last updated at 7:15:45 PM, Sun Feb 02 2025)