How We Get the Data

Transparency is our foundation. Learn exactly how we collect, validate, and process Nancy Pelosi's stock trading data from official government sources.

Daily

Data Updates

100%

Official Sources

4-Tier

Backup System

2AM

UTC Sync

Data Collection Process

1. Automated Trigger

GitHub Actions runs daily at 2:00 AM UTC

Data Source Priority (Fallback System)

1st

Official House Clerk

Downloads yearly ZIP files and parses PTR PDFs

Most Authoritative
2nd

GitHub APIs

House & Senate Stock Watcher community data

Free & Fast
3rd

FMP API

Financial Modeling Prep paid service

Requires $149/mo
4th

Sample Data

Fallback to ensure site functionality

Always Available

3. Data Processing & Validation

• PDF parsing
• Duplicate removal
• Date standardization
• Amount validation

Quality Check

Validate data integrity

Backup Creation

Save timestamped backup

Website Update

Deploy new data live

Timing & Legal Requirements

Legal Deadline: Congress members must file within 45 days of trade
Our Update: Data appears within 24 hours of government publication
Sync Schedule: Daily at 2:00 AM UTC (10:00 AM Beijing time)

Our Methodology

Our data collection follows a priority-based hierarchy to ensure maximum accuracy and reliability. We prioritize official government sources over third-party APIs, implementing intelligent fallbacks to maintain service continuity.

Legal Compliance & Timing

All congressional members must disclose stock trades within 45 days of the transaction date, as mandated by the STOCK Act of 2012. This means:

  • Trades may appear weeks after they actually occurred
  • Recent trading activity might not yet be disclosed
  • All data comes from legally required government filings
  • We cannot predict or report undisclosed future trades

Data Sources in Detail

Priority 1
U.S. House Clerk Official Website

Primary source for all congressional financial disclosures

Data Format

ZIP archives containing PDF files

Update Frequency

Real-time as members file reports

Cost

Free

✅ Advantages
  • Most authoritative and legally binding
  • Direct from government source
  • Includes all required PTR forms
  • Real-time updates as filed
⚠️ Limitations
  • PDF format requires parsing
  • 45-day legal filing delay
  • Website occasionally experiences downtime
Priority 2
House Stock Watcher (GitHub)

Community-maintained repository of parsed congressional data

Data Format

CSV files with daily updates

Update Frequency

Daily (when available)

Cost

Free

✅ Advantages
  • Pre-processed JSON/CSV format
  • Fast API access
  • No rate limits
  • Free to use
⚠️ Limitations
  • Depends on community maintenance
  • May lag behind official source
  • Repository availability not guaranteed
Priority 3
Financial Modeling Prep API

Premium financial data API with congressional trading endpoints

Data Format

JSON API responses

Update Frequency

As available

Cost

$149/month

✅ Advantages
  • Professional API with SLA
  • Structured JSON responses
  • Historical data available
  • Reliable infrastructure
⚠️ Limitations
  • Requires $149/month Ultimate plan
  • Currently returns 403 errors
  • Not specifically focused on Pelosi trades
Priority 4
Generated Sample Data

High-quality sample data to ensure website functionality

Data Format

Static JSON file

Update Frequency

As needed for testing

Cost

Free

✅ Advantages
  • Always available
  • Maintains site functionality
  • Realistic data structure
  • No external dependencies
⚠️ Limitations
  • Not real trading data
  • Used only as last resort
  • Limited historical depth

Intelligent Fallback System

Our system automatically tries data sources in priority order. If the primary source fails, we immediately fall back to the next available source to ensure continuous data availability.

Current Status:
  • 🟢 Official House Clerk: Working reliably
  • 🟡 GitHub Sources: Variable availability
  • 🔴 FMP API: Requires paid subscription
  • 🟢 Sample Data: Always available as backup

Quality Assurance

Data Validation
  • • PDF parsing with error detection
  • • Duplicate trade removal
  • • Date format standardization
  • • Amount range validation
Backup & Recovery
  • • Automatic daily backups
  • • Multi-source data reconciliation
  • • Version control for data changes
  • • Manual verification for anomalies

Technical Implementation

Automated Pipeline
1

GitHub Actions Trigger

Runs daily at 2:00 AM UTC (10:00 AM Beijing)

2

Data Collection

Scrapes official sources with intelligent fallbacks

3

Processing & Validation

Parses PDFs, validates data, removes duplicates

4

Deployment

Updates website data and creates backup

Our Transparency Promise

Open Source: Our data collection methodology is transparent and our scraping scripts are available for public review.

No Bias: We present data exactly as disclosed in official government filings without editorial interpretation.

Real-time Updates: When new trades are disclosed, they appear on our site within 24 hours of government publication.

Error Correction: If we discover any data errors, we correct them immediately and maintain a record of changes.

Explore More