Back to projects

Lender Draw Automation: Invoice Verification and Extraction for Construction Funding

The Problem

Construction lender draws require submitting vendor invoices for funding reimbursement. Two manual steps: verifying invoices weren't already submitted on previous draws, and extracting individual invoice PDFs. Each draw takes 5 hours without automation.

Verification is slow because vendor names vary across exports. "ABC Contractors LLC" in March appears as "ABC Contractors" in May. You cross-reference against three previous draw packages to avoid submitting duplicates.

Extraction requires choosing between two slow options. You either click individual invoices in the PMS one by one to download separate PDFs, or export one 1000-page bulk PDF and manually split it. The bulk export has no invoice separators. You scroll through identifying where each invoice starts, count pages, split the PDF, rename files with invoice numbers and vendor names.

The Solution

Two scripts: invoice verification dedupes against historical submissions, PDF extraction splits combined files into individual invoices.

The verification script loads current vendor invoices and historical processed invoices from one workbook, filters to construction GL accounts, normalizes vendor names, matches on vendor + invoice number, outputs only new invoices.

The extraction script reads the combined PDF, detects invoice numbers using regex, identifies page counts from "Page X of Y" text, splits each invoice into a separate file named with invoice number and vendor.

Invoice Verification Code

Loading and Column Detection

current = pd.read_excel(WORKBOOK_PATH, 
                         sheet_name=CURRENT_EXPORT_SHEET, 
                         header=CURRENT_HDR_ROW_XL-1)
hist = pd.read_excel(WORKBOOK_PATH, 
                      sheet_name=HISTORICAL_LOG_SHEET, 
                      header=HIST_HDR_ROW_XL-1)

VENDOR_COL_CUR = find_col(current, ["Vendor"])
INVOICE_COL_CUR = find_col(current, ["Invoice Number", 
                                     "Invoice #"])

The script reads two sheets from one workbook: current period invoices and historical processed invoices. The find_col function searches for columns by name variants because PMS labels change month to month.

GL Filtering

current[GLNUM_COL + "_int"] = to_int_digits(current[GLNUM_COL])

gl = current[GLNUM_COL + "_int"]
mask_gl_range = (gl >= GL_LOW) & (gl <= GL_HIGH)
mask_exclude = gl.isin(list(GL_EXCLUDE))
filtered = current[mask_gl_range & (~mask_exclude)]

Construction draws only include specific GL accounts. The script extracts numeric codes from text like "6450 - Building Repairs" and keeps only the configured range while excluding specific accounts.

Vendor Name Normalization

def vendor_key_for_match(series: pd.Series) -> pd.Series:
    base = vendor_key(series.fillna(""))
    first_word = (series.fillna("")
                  .astype(str).str.strip().str.casefold()
                  .str.split().str[0])
    
    mask = pd.Series(False, index=base.index)
    for pref in VENDOR_FAMILY_PREFIXES:
        mask = mask | base.str.startswith(pref)
    
    return base.mask(mask, first_word)

Standard normalization lowercases and strips whitespace. For vendor families with multiple entity names (ABC Contractors LLC, ABC Contractors Inc), the script matches only the first word if configured. This prevents the same invoice from different legal entities appearing as separate.

Duplicate Detection

filtered["_vendor_key"] = vendor_key_for_match(
                            filtered[VENDOR_COL_CUR])
filtered["_invno_key"] = invoice_key(
                           filtered[INVOICE_COL_CUR])

hist["_vendor_key"] = vendor_key_for_match(
                        hist[VENDOR_COL_HIST])
hist["_invno_key"] = invoice_key(hist[INVOICE_COL_HIST])

match_left = filtered.merge(
    hist[["_vendor_key", "_invno_key"]].drop_duplicates(),
    on=["_vendor_key", "_invno_key"],
    how="left",
    indicator=True
)

new_txns = match_left[match_left["_merge"] == "left_only"]

The merge joins current invoices to historical invoices on normalized vendor + invoice number. Rows marked "left_only" are new invoices not found in history.

Summary Rollup

final = (new_txns
         .groupby([VENDOR_COL_CUR, INVOICE_COL_CUR, 
                   BLDG_COL, GLNAME_COL], 
                  as_index=False)
         .agg({
             AMOUNT_COL: "sum",
             INVDATE_COL: first_nonnull,
             POSTMONTH_COL: first_nonnull
         }))

The groupby collapses multiple line items per invoice into one row with total amount and first non-blank dates.

PDF Extraction Code

Document Detection

rx_doc_id = re.compile(
    r"(?:INVOICE|DOCUMENT)\s*(?:NO\.?|#|:)?\s*" +
    r"([A-Z]{2}\s*[-:]?\s*\d{5,8})",
    re.IGNORECASE
)

for i, text in enumerate(page_texts):
    for m in rx_doc_id.finditer(text):
        doc_id = normalize_id(m.group(1))
        starts_by_id.setdefault(doc_id, []).append(i)

The script reads every page's text and searches for invoice numbers. When found, it records which page that invoice starts on.

Page Count Detection

rx_pagecount = re.compile(
    r"page\s*(\d+)\s*(?:of|/)\s*(\d+)", 
    re.IGNORECASE
)

mc = rx_pagecount.search(first_text)
if mc:
    total_pages = max(1, int(mc.group(2)))
else:
    total_pages = 1

Most invoices include "Page 1 of 3" text. The script extracts the total page count. If not found, assumes 1 page.

File Creation

writer = PdfWriter()
for p in range(start_idx, end_idx):
    writer.add_page(reader.pages[p])

out_file = out_dir / f"{doc_id} - {label}.pdf"
with open(out_file, "wb") as f:
    writer.write(f)

The script copies pages into a new PDF. Filename combines invoice number with extracted vendor name.

The Output

The verification script writes five tabs. Raw_Input shows all vendor invoices from the PMS. Filtered_GL shows construction GL accounts after the range filter. New_Transactions shows invoices that don't match historical records. Final_Summary shows one row per unique invoice with rolled-up amounts. Audit shows row counts at each stage.

The Final_Summary tab is the lender submission. Vendor, Invoice Number, Invoice Date, Building, GL Account Name, Amount, Payment info. Sorted by Vendor then Invoice Number. The lender reviews this list, matches to extracted PDFs, processes the draw.

The extraction script creates individual PDFs in the output folder. Filenames include invoice number and vendor name: "AB12345 - Smith Construction.pdf". No manual renaming required.

What This Enables

The scripts run in under 2 minutes combined. The verification script processes the PMS export and identifies new invoices in 45 seconds. The extraction script splits a 1000-page PDF into individual files in 30 seconds. Draw preparation that previously required manual invoice verification, PDF splitting, and file renaming now runs automatically.

Vendor name normalization handles formatting variations. Invoices from the same vendor using different legal entity names match correctly. The first-word matching rule applies to configured vendor families that commonly appear with subsidiary or division names.

File naming follows a consistent pattern with invoice number and vendor name. PDFs export as "AB12345 - Smith Construction.pdf" without manual renaming. The audit trail shows what filtered in or out at each stage.

Get in Touch