Web Crawling NLP Information Retrieval (IR)
Vertical Search Engine
Delivered a fully functional search interface that provides researchers with up-to-date, categorized access to institutional knowledge.
Objectives
- Automate the extraction of research metadata (titles, authors, publication years).
- Implement an advanced ranking system for keyword-based querying.
- Categorize research outputs through automated subject classification.
Solution
- Built an automated metadata extraction pipeline that scrapes and parses RCIH publication outputs weekly.
- Implemented TF-IDF based relevancy ranking and BM25 algorithms to improve search precision.
- Integrated a Subject Classification module to automatically tag research papers into healthcare domains.