Back to projects
Apr 09, 2025
2 min read

Automated Roster Processor

A Python script that extracts and categorizes names from Word documents into Excel, regardless of irregularities.

A Python project that uses Pandas and python-docx libraries to automatically extract and combine text (i.e unique names) from multiple Word documents into structured Excel spreadsheets. It intelligently identifies section headings like “HADIR BARIS” or “HADIR BERBARIS”, even if they are misspelled or written in different ways, and groups each name under the right category. If no valid header is found, the script flags it for user review.

The script produces 2 excel formats:

  1. A complete list of all unique names with categories and a total count.
  2. A sorted version by category, including counts within each group.

This automation significantly reduces manual workload, minimizes human error, and streamlines reporting processes for roster or duty management tasks.