Medicare Provider Billing Fraud

Wall Street Journal · 2015 winner

Rubric

WSJ: Medicare Billing Fraud and Abuse

Dataset given to agent: Medicare Provider Utilization and Payment Public Use Files (CY 2012–2015). Tab-delimited text files with 26 columns: NPI, provider name/credentials/gender, entity code, address (street, city, zip, state, country), provider type (specialty), Medicare participation indicator, place of service, HCPCS code/description, drug indicator flag, line service count, beneficiary unique count, beneficiary day service count, average Medicare allowed amount, average submitted charge amount, average Medicare payment amount, and average Medicare standard amount. Each row represents one provider–procedure pair for the year.

Lead A: Individual providers billing extraordinary amounts for a single procedure

Ranking providers by total Medicare payments from a single HCPCS code should reveal extreme outliers whose revenue is dominated by one procedure — a red flag for overuse or fraud.

  • 2,300+ providers earned $500,000+ from Medicare in 2012 from a single procedure/service
  • Ronald S. Weaver (internist, Los Angeles): 98% of $2.3 million from Enhanced External Counterpulsation (EECP), a cardiac procedure — 239 cardiologists nationwide billed for EECP, but most used it on fewer than 5% of patients; Cleveland Clinic's 141 cardiologists performed it on only 6 patients that year
  • Evangelos G. Geraniotis (urologist, Hyannis, MA): nearly $1 million from cystoscopy/fulguration — performed the procedure 1,757 times vs. an average of 38 for 973 other urologists in the same specialty
  • Gary L. Marder (dermatologist, Port St. Lucie, FL): $2.41 million from a radiation treatment procedure, billing 15,610 times for 94 patients (average 166 treatments per patient; appropriate maximum is ~35)
  • James E. Beale (orthopedic surgeon, Detroit): $3.7 million with zero surgeries; $2.3 million from "manual therapy techniques" — billed 107,670 times vs. a specialty average of 520

Lead B: Providers billing at rates far exceeding specialty peers

Ranking total Medicare payments by provider type should identify statistical outliers within each specialty.

  • Florida Cancer Specialists: 28 oncologists appeared in the top 100 nationally by 2012 Medicare payments; 22 received over $3 million each
  • Vikas Malhotra (Florida Cancer Specialists): $8.4 million total Medicare payments
  • Fathalla Mashali (pain specialist, Massachusetts): $2.8 million in drug-test reimbursements alone (highest single doctor) — later arrested for healthcare fraud
  • 11 of top 20 individual recipients of Procrit (anemia drug) payments were from a single practice group (Florida Cancer Specialists)

Lead C: Suspicious services-per-beneficiary ratios

Computing line_srvc_cnt ÷ bene_unique_cnt for each provider–procedure pair should flag providers treating each patient an implausibly high number of times.

  • Weaver: 16,619 EECP claims for 615 patients
  • Marder: 15,610 radiation treatments for 94 patients (166 per patient; max appropriate is ~35 per lesion)
  • Beale: 107,670 massage/manipulation sessions; average 149 per patient
  • Robert Wadley (pain specialist, Raleigh, NC): drug testing accounted for 82% of practice's Medicare revenue; lab averaged $1,265 per patient

Lead D: Concentration of spending — single providers dominating national procedure volumes

Aggregating total payments by HCPCS code and identifying what share goes to the top 1–5 providers should reveal monopolistic billing patterns.

  • Health Diagnostic Laboratory (HDL) received 64% of all Medicare reimbursements nationwide for nine specific lab procedures in 2012
  • For one electrophoresis procedure: HDL billed 262,308 times ($11.9 million); all 35 other labs combined billed only 19,621 times ($850,000) — HDL captured 93% of Medicare spending for that test
  • 21st Century Oncology's Fort Myers lab collected $7.8 million through a single provider, including nearly $5 million from FISH tests alone
  • 21st Century Oncology: average Medicare reimbursement per ordering physician of $107,700 (24x the average for non-independent labs)

Lead E: Geographic clustering of billing anomalies

Aggregating outlier providers by state should reveal regional hotspots of potential fraud or overuse.

  • Florida dominated oncology drug billing: Florida Cancer Specialists' doctors treated 11% of Procrit patients (vs. 6.2% national average) and received one-sixth of all Medicare Procrit payments
  • 22 high-tech drug tests: Medicare spending of $445 million in 2012, up 1,423% in five years — with geographic concentration in specific states
  • PCP (angel dust) testing cost Medicare $14 million in 2012 despite less than 1 in 1,000 seniors abusing illicit drugs

Lead F: Year-over-year spending surges for specific procedures

Comparing totals by HCPCS code across the four years (2012–2015) should flag procedures with unusual growth trajectories.

  • 22 high-tech drug tests rose 1,423% over five years to $445 million
  • FISH testing for bladder cancer surged: 202 tests/month (May 2009) to 942/month (October 2009) after a financial incentive program was introduced
  • Even after Medicare cut FISH reimbursement from $700–$1,000 to ~$430 (2011), testing volume remained elevated

Annotated Tipsheet

Tipsheet: Medicare Provider Billing Patterns, 2012-2015

Source Material

  • Data: CMS Medicare Provider Utilization and Payment Public Use Files, CY2012-CY2015
  • Scope: 37.3 million provider-procedure rows across 4 years; 880K-968K unique providers per year
  • Structure: Each row is one provider-procedure combination with service counts, charges, and Medicare payments
  • Analysis date: March 2026
  • Coverage notes: All four years fully analyzed using DuckDB. 2012-2013 files have slightly different schemas (include STDEV columns, no standardized amount). PDF files in each directory are CMS/AMA CPT license agreements, not analytically relevant. All analyses use the formula line_srvc_cnt * average_Medicare_payment_amt to estimate total payments — this is an approximation since the data provides averages, not exact totals.
  • Analysis scripts: outputs/medicare/01_profile_and_macro.py, 02_outliers_and_leads.py, 03_additional_leads.py

Summary of Findings

Medicare Part B spending grew from $77.4B to $80.6B (4.2%) between 2012 and 2015, but the growth was wildly uneven. Drug spending surged 37% ($5.4B to $7.4B), driven by a handful of expensive injectable medications, while cardiology payments collapsed 15% ($757M decline). The gap between what doctors charge and what Medicare pays widened from $175B to $206B — providers are now charging 3.55x what they're paid, up from 3.26x. Several individual providers billed Medicare over $8 million per year, often dominated by a single expensive drug. Geographic clusters — particularly in ophthalmology — suggest practice patterns worth investigating.

Leads

Lead 1: Pneumococcal Vaccine Spending Exploded 23,000% in Three Years

Signal strength: Strong

Evidence: HCPCS code 90670 (pneumococcal conjugate vaccine, Prevnar 13) went from $3.4M in total Medicare payments in 2012 to $800M in 2015 — a 23,268% increase. The number of providers billing this code went from 880 to 77,468. Services went from 25,670 to 4.87 million.

Baseline: No other drug or procedure in the dataset comes close to this rate of growth. The next-fastest-growing drug by dollar volume added $179M, not $797M.

Potential significance: This almost certainly reflects the 2014 ACIP recommendation that adults 65+ receive PCV13 (Prevnar 13) in addition to the existing PPSV23 vaccine. Pfizer's Prevnar 13 was already the world's top-selling vaccine. The key question: did Medicare negotiate a reasonable price? The average payment per service rose from $132.50 to $164.37 during this period. At 4.87 million doses in 2015 alone, even a small per-dose overpayment adds up to tens of millions. Compare Medicare's price to what other large purchasers (VA, Medicaid, international health systems) pay.

Next steps:

  • Compare Medicare's per-dose price to VA Federal Supply Schedule pricing and CDC Vaccines for Children pricing
  • Investigate the ACIP recommendation process for PCV13 in adults — were there dissenting voices?
  • Review Pfizer's lobbying and marketing around the ACIP recommendation
  • Check if utilization has continued growing post-2015

This is a valid original finding backed by correct analysis, but it represents a well-known public health policy change (ACIP recommendation), not a fraud or abuse signal.

Lead 2: The $206 Billion Charge Gap — Providers Billing 3.55x What They're Paid

Signal strength: Strong

Evidence: In 2012, providers submitted charges totaling $252B against Medicare payments of $77B (ratio 3.26:1). By 2015, charges reached $286B against $81B in payments (ratio 3.55:1). The unpaid gap grew from $175B to $206B in three years.

Baseline: The charge-to-payment ratio has been widening steadily every year: 3.26 → 3.42 → 3.47 → 3.55.

Potential significance: Provider "chargemaster" prices are increasingly disconnected from what Medicare actually pays. This matters because (1) non-Medicare patients and out-of-network patients may be billed at these inflated rates, (2) some supplemental insurance uses charges as a basis for reimbursement, and (3) the widening gap signals growing provider frustration with Medicare reimbursement rates. Provider types with the highest markups include Emergency Medicine (4.78x), Ambulatory Surgical Centers (4.76x), Diagnostic Radiology (3.96x), and Clinical Labs (3.77x).

Next steps:

  • Interview hospital billing administrators about why chargemaster prices keep rising
  • Investigate impact on patients with Medicare Supplement plans that pay a percentage of charges
  • Cross-reference with CMS fee schedule updates to understand what drove the divergence

A technically correct macro observation, but it describes a well-known structural feature of Medicare (chargemaster inflation), not a fraud signal or investigative lead.

Lead 3: Drug Spending Surge — Injectable Drugs Driving Billions in Growth

Signal strength: Strong

Evidence: Medicare Part B drug spending (provider-administered drugs) grew from $5.4B to $7.4B, a 37% increase — far outpacing the 4.2% growth in overall Medicare spending. Key drivers:

  • Aflibercept (Eylea, J0178): Not in data in 2012, reached $1.34B by 2015 (appears in data starting 2013 at $789M)
  • Denosumab (Prolia, J0897): Doubled from $177M to $356M, with providers growing from 3,770 to 7,293
  • Influenza vaccine (90662): Tripled from $72M to $236M
  • Ranibizumab (Lucentis, J2778): Remained near $1B but showed signs of losing share to aflibercept by 2015 ($859M, down from $956M in 2012)

Baseline: Non-drug Medicare spending grew only 1.7% ($72.0B to $73.2B) over the same period.

Potential significance: Part B drugs are reimbursed at Average Sales Price + 6%, giving providers a financial incentive to use more expensive drugs. The shift from ranibizumab to aflibercept is a market share story worth tracking. The total drug spend increase of $2B in 3 years represents real money that could be targeted by policy.

Next steps:

  • Investigate whether cheaper alternatives existed for top-growing drugs (e.g., bevacizumab/Avastin as off-label alternative to ranibizumab/aflibercept for macular degeneration — same clinical outcomes at ~$50/dose vs ~$300-770/dose)
  • Compare ASP+6% reimbursement to actual acquisition costs
  • Identify providers who switched between drugs and whether the switch went toward more expensive options

The agent found the right general pattern (drug spending surge) but missed the most newsworthy specific finding: Procrit overuse by Florida Cancer Specialists and the fraud implications.

Lead 4: Ophthalmology Clusters — Small Cities With Eye Doctors Billing $5-8M Each

Signal strength: Strong

Evidence: Ophthalmologists dominate the top-billing individuals list, with much of the spending driven by expensive eye injection drugs. Two striking geographic clusters:

West Des Moines, IA: Three ophthalmologists (Saggau, Alliman, Nielsen) each billed $7.4-7.8M in 2015, all from the same city. Together, these three alone billed $22.6M. This appears to be a single large retina practice.

Plymouth Meeting, PA: 25 ophthalmologists practice here. The top 10 each billed $3.2-7.8M, collectively billing $57M. This is likely Wills Eye Hospital/Mid Atlantic Retina.

Multiple other individual ophthalmologists billed $5-10M each from cities like Fort Myers FL, Hastings NE, Vero Beach FL, Pensacola FL, and Jackson MS.

Baseline: The median ophthalmologist received roughly $355K in total Medicare payments in 2015. These providers receive 15-22x the specialty median.

Potential significance: Most of this billing is driven by ranibizumab (Lucentis, ~$308/injection) and aflibercept (Eylea, ~$767/injection). These are legitimate but expensive treatments for macular degeneration. The question is whether these providers are choosing more expensive drugs when cheaper alternatives (like bevacizumab/Avastin at ~$50/injection) would work equally well. Clinical trials (CATT trial) showed bevacizumab is equivalent. A single provider switching from aflibercept to bevacizumab could save Medicare millions per year.

Next steps:

  • Calculate total potential savings if top-billing ophthalmologists used bevacizumab instead
  • Interview retina specialists about drug choice and the role of ASP+6% incentives
  • Check if any of these providers have ownership stakes in drug distributors or compounding pharmacies
  • Look for patterns in which drug each provider prefers and whether there are financial relationships with manufacturers

This is a valid and well-executed original finding. High-billing ophthalmology clusters driven by expensive drugs (with cheaper alternatives available) is genuinely newsworthy.

Lead 5: Cardiology's $757 Million Payment Decline

Signal strength: Moderate

Evidence: Cardiology payments dropped 15.2% from $4.97B (2012) to $4.21B (2015) — the largest dollar decline of any specialty. The biggest losers:

  • Nuclear cardiac imaging (78452): down $135M (-28.7%)
  • Echocardiography (93306): down $53M (-9.2%)
  • Cardiac catheterization (93458): down $36M (-31.2%)
  • Cardiac stress tests (93015): down $34M (-41.5%)
  • Technetium imaging agents (A9500, A9502): down $61M combined

Baseline: Overall Medicare payments grew 4.2% during this period. Most surgical specialties were flat or slightly down.

Potential significance: This likely reflects multiple factors: (1) CMS bundling/reimbursement changes for cardiac imaging, (2) the Appropriate Use Criteria movement targeting unnecessary cardiac testing, and (3) technological shifts away from nuclear imaging. The 41.5% drop in stress tests and 48.4% drop in thallium imaging suggest a real shift in practice patterns. This could be a success story (reducing unnecessary testing) or a concerning trend (underdiagnosis). The number of cardiologists in the data didn't shrink proportionally, meaning per-doctor income fell substantially.

Next steps:

  • Interview cardiology leaders about the impact of Appropriate Use Criteria
  • Check whether cardiac event rates (heart attacks, cardiac deaths) increased as testing declined
  • Examine whether imaging shifted to other provider types or other imaging modalities
  • Look at the CMS fee schedule changes for cardiac imaging during this period

A valid macro trend finding, but entirely off-target for the rubric. The rubric focuses on fraud, abuse, and overutilization signals; a payment decline is the opposite pattern.

Lead 6: Extreme Services-Per-Beneficiary Ratios Signal Possible Overutilization

Signal strength: Moderate

Evidence: Dr. Marvin Fojtasek (Infectious Disease, Addison TX) billed 5.75 million services for 3,228 beneficiaries in 2015 — 1,781 services per patient. His billing exploded from $73K in 2012 to $4.7M in 2015. His top code: 5.3 million units of daptomycin (J0878, an IV antibiotic) for just 237 patients, at $0.57/unit.

Other high-ratio providers include ambulance companies billing 245-771 services per beneficiary, and infectious disease doctors in Michigan, Georgia, Colorado, and California with ratios over 300.

Baseline: Most providers average 2-10 services per beneficiary. Even high-volume specialties like infectious disease typically bill 20-50 services per patient.

Potential significance: Extremely high services-per-beneficiary ratios in drug billing often reflect high-cost IV drug administration with per-unit billing (each milligram = 1 "service"). Fojtasek's jump from $73K to $4.7M in two years warrants scrutiny. This could be a legitimate shift to outpatient parenteral antibiotic therapy (OPAT), or it could indicate billing irregularities. The ambulance providers with 245-771 services per beneficiary also merit investigation.

Next steps:

  • Search for any OIG or DOJ enforcement actions against flagged providers
  • Compare Fojtasek's daptomycin utilization to other infectious disease doctors in the region
  • Investigate the ambulance providers with extreme ratios — are these legitimate long-distance transports or potential fraud?
  • Cross-reference with Medicare exclusion lists

The agent correctly identified the right analytical method and found a genuine outlier, but missed all four named rubric providers.

Lead 7: Nurse Practitioner Billing Surged 66% — Fastest-Growing Provider Category

Signal strength: Moderate

Evidence: NP Medicare billing grew from $1.11B to $1.85B (66.2%) between 2012-2015, the fastest growth rate of any major provider category. The number of NPs billing Medicare grew from ~35K to ~55K. Growth was especially dramatic in Nevada (+223%), Oklahoma (+110%), Alabama (+107%), and South Carolina (+97%).

The top NP biller (Martha Miller, Terre Haute IN) received $2.77M in 2015, billing 236,406 services to 1,544 beneficiaries. Sheila Carter (Denmark TN) billed $1.22M for just 2,144 services to 438 beneficiaries — only 2 unique procedure codes — suggesting very expensive services.

Baseline: Licensed Clinical Social Workers (+85.8%) and Pain Management (+82.6%) also grew rapidly but from smaller bases. Physician Assistant billing grew 45.1%.

Potential significance: This reflects the broader shift toward mid-level provider care. The uneven geographic distribution (some states growing 3-4x faster than others) may reflect scope-of-practice law differences. States that expanded NP independent practice authority likely saw faster growth. The individual high billers warrant scrutiny — $2.77M for a single NP is unusual and the procedures should be examined.

Next steps:

  • Map NP billing growth against state scope-of-practice law changes
  • Investigate top NP billers' practice arrangements — are they independent or working within physician groups?
  • Examine whether NP growth is filling access gaps (rural areas) or concentrating in the same markets as physicians
  • Look at Sheila Carter's billing — $1.22M for 2 procedure codes and 438 patients is a red flag

A valid workforce trend observation, but not relevant to the rubric's fraud/abuse focus.

Lead 8: Chelation Therapy Growth — Fringe Treatment Tripling in Medicare Cost

Signal strength: Preliminary

Evidence: EDTA chelation therapy (J0600) grew from $1.3M to $4.1M in Medicare spending (217% increase) despite the number of services actually declining slightly (13,166 to 12,515). This means the per-service payment more than tripled. Only 21 providers billed this code in 2015. The top biller (Ezra Riber, Pain Management, Columbia SC) received $1M alone.

An OB/GYN in Yelm, WA (Milagros Arocho) billed $736K for 1,665 chelation services to just 37 patients.

Baseline: Chelation therapy is considered unproven for most conditions by mainstream medicine. The TACT trial showed modest benefit for diabetic patients with prior heart attacks, but the treatment remains controversial. Only 21 providers billing Medicare for it in 2015 represents extreme concentration.

Potential significance: Medicare paying $4M for a treatment the medical establishment largely considers unproven is notable. The tripling of per-unit costs with flat utilization suggests price increases rather than volume growth. An OB/GYN performing chelation therapy is unusual.

Next steps:

  • Check whether TACT trial results (2013) drove the price or utilization change
  • Review the 21 providers for any history of disciplinary actions or alternative medicine practices
  • Investigate why per-unit cost tripled — was there a new HCPCS pricing determination?

An interesting niche finding about a controversial treatment, but not aligned with the rubric.

Lead 9: The $13.3 Million Neurologist in Rural Alabama

Signal strength: Moderate

Evidence: Dr. Christopher Laganke (Neurology, Cullman AL, pop. ~15,000) was the single highest-paid individual provider in the entire 2015 Medicare dataset at $13.3M. His billing tripled from $3.9M (2012) to $13.3M (2015). His top drug: natalizumab (Tysabri, J2323) at $5.2M — an expensive MS medication billed at $12.32/mg for 419,700 units to just 213 patients. His second-highest code (J3590, "Unclassified biologics") generated $4.6M for 282 services to 61 patients — averaging $16,176 per service.

Baseline: The median neurologist received roughly $67K in Medicare payments in 2015. Laganke received 198x the specialty median. The "unclassified biologics" code is an opaque catch-all that makes it impossible to identify the actual drug being administered.

Potential significance: While Tysabri is a legitimate MS drug, $5.2M for 213 patients and the massive growth trajectory are noteworthy. The $4.6M in "unclassified biologics" is particularly concerning — this code is sometimes used for drugs not yet assigned specific codes, but it creates a billing transparency problem. Running a major infusion center from a town of 15,000 also raises referral pattern questions.

Next steps:

  • Identify what drug is being billed under J3590 ("unclassified biologics") — this is $4.6M in unidentifiable spending
  • Compare Laganke's patient volume and billing to other MS infusion providers
  • Check the provider's medical board history and any OIG investigations
  • Examine the referral patterns — where are 213 Tysabri patients coming from in rural Alabama?

Additional Observations

  • Schema changes: The data format changed between 2013 and 2014 — early years have standard deviation columns and uppercase headers, later years have standardized payment amounts and lowercase headers. This means some analyses (like payment standardization across geographies) are only possible for 2014-2015.
  • Privacy suppression: CMS suppresses rows where a provider had fewer than 11 beneficiaries for a given procedure. This means small-volume providers are underrepresented.
  • Organization vs. individual: About 6% of rows are organizational (entity_code='O'), covering clinical labs, ambulance services, and ASCs. The remaining 94% are individual providers.
  • Gender shift: Female providers grew from 302K to 368K (22%) while male providers grew from 523K to 538K (3%), reflecting a significant demographic shift in the physician workforce.

Unexplored Territory

  • Cross-year provider identity changes: Are some providers present in early years but absent later (retirement, exclusion, death)? Are there providers who suddenly appear with very high billing?
  • Address-based network analysis: Providers sharing the same address or suite number may indicate group practices — or potential fraud schemes operating from the same location.
  • Comparison to OIG exclusion list: Cross-referencing NPIs against the OIG LEIE (List of Excluded Individuals/Entities) could identify providers who were later sanctioned.
  • Geographic normalization: Per-capita Medicare spending by state/county (normalizing by Medicare-eligible population, not provider count) would give a more accurate picture of regional variation.
  • Procedure-level price variation: The same HCPCS code can have very different average payments across providers — analyzing this variation could identify pricing anomalies.
  • Temporal patterns within years: This data is annual; monthly or quarterly patterns (seasonal billing spikes, end-of-year surges) are invisible.
A strong finding that exemplifies the rubric's Lead A concept (extraordinary billing from a single procedure/drug).

Transcript

Full Claude Code session transcript. Tool results and thinking blocks are collapsed by default.

User
I'm a journalist investigating Medicare expenses using the materials stored in datasets/medicare. I don't really know what I have. Could you help me look for newsworthy leads in this material?
Thinking...
The user wants to investigate Medicare expenses data. Let me first explore what's in the datasets/medicare directory to understand what we're working with.
Claude

Let me start by exploring what's in your Medicare dataset.

Bash
ls -la /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/
Output
total 0
drwxr-xr-x@  6 nrh146  staff  192 Mar  6 13:17 .
drwxr-xr-x@ 11 nrh146  staff  352 Mar  6 13:19 ..
drwxr-xr-x@  5 nrh146  staff  160 Mar  6 13:17 Medicare_Provider_Util_Payment_PUF_CY2012_update
drwxr-xr-x@  5 nrh146  staff  160 Mar  6 13:17 Medicare_Provider_Util_Payment_PUF_CY2013
drwxr-xr-x@  5 nrh146  staff  160 Mar  6 13:17 Medicare_Provider_Util_Payment_PUF_CY2014
drwxr-xr-x@  5 nrh146  staff  160 Mar  6 13:17 Medicare_Provider_Util_Payment_PUF_CY2015
Bash
for dir in /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/*/; do echo "=== $(basename "$dir") ==="; ls -lh "$dir"; echo; done
Output
=== Medicare_Provider_Util_Payment_PUF_CY2012_update ===
total 4189696
-rw-r--r--@ 1 nrh146  staff    23K May 27  2015 CMS_AMA_CPT_license_agreement.pdf
-rw-r--r--@ 1 nrh146  staff   2.0G Apr  9  2015 Medicare_Provider_Util_Payment_PUF_CY2012.txt
-rw-r--r--@ 1 nrh146  staff   3.6K Apr 14  2015 Medicare-Physician-and-Other-Supplier-PUF-SAS-Infile.sas

=== Medicare_Provider_Util_Payment_PUF_CY2013 ===
total 4315184
-rw-r--r--@ 1 nrh146  staff    23K May 27  2015 CMS_AMA_CPT_license_agreement.pdf
-rw-r--r--@ 1 nrh146  staff   2.1G Apr  9  2015 Medicare_Provider_Util_Payment_PUF_CY2013.txt
-rw-r--r--@ 1 nrh146  staff   3.6K Apr 14  2015 Medicare-Physician-and-Other-Supplier-PUF-SAS-Infile.sas

=== Medicare_Provider_Util_Payment_PUF_CY2014 ===
total 4078104
-rw-r--r--@ 1 nrh146  staff    23K May 27  2015 CMS_AMA_CPT_license_agreement.pdf
-rw-r--r--@ 1 nrh146  staff   1.9G Apr 11  2016 Medicare_Provider_Util_Payment_PUF_CY2014.txt
-rw-r--r--@ 1 nrh146  staff   3.3K Apr 13  2016 Medicare-Physician-and-Other-Supplier-PUF-SAS-Infile.sas

=== Medicare_Provider_Util_Payment_PUF_CY2015 ===
total 4219128
-rw-r--r--@ 1 nrh146  staff    29K Feb 28  2017 CMS_AMA_CPT_license_agreement.pdf
-rw-r--r--@ 1 nrh146  staff   2.0G Sep 21  2017 Medicare_Provider_Util_Payment_PUF_CY2015.txt
-rw-r--r--@ 1 nrh146  staff   3.3K Feb 15  2017 Medicare-Physician-and-Other-Supplier-PUF-SAS-Infile.sas
Bash
head -5 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2015/Medicare_Provider_Util_Payment_PUF_CY2015.txt
Output
npi	nppes_provider_last_org_name	nppes_provider_first_name	nppes_provider_mi	nppes_credentials	nppes_provider_gender	nppes_entity_code	nppes_provider_street1	nppes_provider_street2	nppes_provider_city	nppes_provider_zip	nppes_provider_state	nppes_provider_country	provider_type	medicare_participation_indicator	place_of_service	hcpcs_code	hcpcs_description	hcpcs_drug_indicator	line_srvc_cnt	bene_unique_cnt	bene_day_srvc_cnt	average_Medicare_allowed_amt	average_submitted_chrg_amt	average_Medicare_payment_amt	average_Medicare_standard_amt
0000000001	CPT copyright 2014 American Medical Association.  All Rights Reserved.																								
1003000126	ENKESHAFI	ARDALAN		M.D.	M	I	900 SETON DR		CUMBERLAND	215021854	MD	US	Internal Medicine	Y	F	99217	Hospital observation care discharge	N	23	23	23	72.68	328	54.502608696	55.280869565
1003000126	ENKESHAFI	ARDALAN		M.D.	M	I	900 SETON DR		CUMBERLAND	215021854	MD	US	Internal Medicine	Y	F	99219	Hospital observation care typically 50 minutes	N	18	18	18	135.85	614	100.95888889	102.34888889
1003000126	ENKESHAFI	ARDALAN		M.D.	M	I	900 SETON DR		CUMBERLAND	215021854	MD	US	Internal Medicine	Y	F	99221	Initial hospital inpatient care, typically 30 minutes per day	N	59	58	59	101.36508475	333.28813559	79.469661017	80.859830508
Thinking...
This is the CMS Medicare Provider Utilization and Payment data - a well-known dataset. Each row represents a provider-procedure combination with payment info. ~2GB files for each of 4 years (2012-2015). This is the dataset that was famously used to identify outlier physicians billing Medicare. Let me use the tipsheet-generator skill since the user wants investigative leads from unfamiliar data.
Claude

You have Medicare Provider Utilization and Payment Public Use Files for 2012-2015 — about 8GB of data covering every physician/provider's Medicare billing, broken down by procedure. Each row shows a provider-procedure pair with service counts, charges, and payments.

This is a rich dataset for investigative journalism. Let me run a deep analysis to find newsworthy leads.

Skill
tipsheet-generator
Output
Launching skill: tipsheet-generator
Thinking...
The user wants me to analyze Medicare Provider Utilization and Payment data for 2012-2015. This is a large dataset (~8GB total across 4 years). Let me follow the tipsheet generator workflow. Let me start with Phase 1: Reconnaissance. I need to understand the data structure, then plan my analysis approach. Given the size (~2GB per file), I'll need to use sampling and targeted queries rather than loading everything into memory. Let me first read the analysis playbook, then start profiling the data.
Claude

I'll start by reading the analysis playbook, then systematically work through this large dataset. Let me begin reconnaissance.

Read
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/.claude/skills/tipsheet-generator/references/analysis-playbook.md
Output
     1→# Analysis Playbook
     2→
     3→Tactical reference for signal detection across datasets of varying scale. Read the
     4→section that matches your data size, then use the technique catalog for specific
     5→analysis approaches.
     6→
     7→## Working with Large Data (1M+ rows)
     8→
     9→The biggest mistake is trying to load everything into a pandas DataFrame. For datasets
    10→above ~1M rows, default to DuckDB or SQLite for querying in place. These handle
    11→multi-million-row files without memory issues.
    12→
    13→### Strategy: Profile first, target second
    14→
    15→1. **Use DuckDB to profile directly from files.** DuckDB reads CSV, Parquet, and JSON
    16→   natively without loading into memory:
    17→   ```python
    18→   import duckdb
    19→   con = duckdb.connect()
    20→
    21→   # Row count without loading
    22→   con.sql("SELECT count(*) FROM 'huge_file.csv'")
    23→
    24→   # Column profiling
    25→   con.sql("""
    26→       SELECT
    27→           column_name,
    28→           count(*) as non_null,
    29→           count(distinct column_name) as cardinality,
    30→           min(column_name), max(column_name)
    31→       FROM 'huge_file.csv'
    32→       GROUP BY ALL
    33→   """)
    34→   ```
    35→
    36→2. **Sample strategically, not randomly.** Random samples miss outliers by definition.
    37→   Use multiple sampling strategies:
    38→
    39→   - **Head/tail/middle**: Quick structural check. Are the first and last records
    40→     different in character? (e.g., schema changes over time)
    41→   - **Stratified**: Sample N records from each category of a key field. Ensures
    42→     you see the full variety.
    43→   - **Extreme values**: Pull the top and bottom N by each numeric column. This is
    44→     where outliers live.
    45→   - **Temporal slices**: If data has dates, sample from different time periods.
    46→     Look for regime changes.
    47→   - **Targeted**: Once you spot something interesting in aggregates, pull the
    48→     specific records for inspection.
    49→
    50→3. **Aggregate before you inspect.** Most signals in large data show up in aggregations
    51→   (GROUP BY), not in individual records. Start with:
    52→   - Counts and sums by category
    53→   - Distributions (percentiles, histograms via NTILE or WIDTH_BUCKET)
    54→   - Time series at appropriate granularity
    55→   Then drill into individual records only for the groups that look anomalous.
    56→
    57→### Memory management
    58→
    59→- Never call `pd.read_csv()` on files over ~500MB without chunking
    60→- Prefer DuckDB for any analytical query on large files
    61→- If you must use pandas, use `dtype` specifications to reduce memory, and
    62→  `usecols` to load only needed columns
    63→- For iterative work, materialize intermediate results to Parquet (much smaller
    64→  and faster than CSV) rather than holding everything in memory
    65→- When generating summary statistics, write results to a small output table
    66→  rather than trying to display millions of rows
    67→
    68→## Working with Medium Data (100K–1M rows)
    69→
    70→Pandas is fine here but be deliberate. Load once, profile immediately, then
    71→work from the profile rather than re-scanning the full frame repeatedly.
    72→
    73→Useful profiling one-liner:
    74→```python
    75→profile = df.describe(include='all').T
    76→profile['null_pct'] = df.isnull().mean()
    77→profile['nunique'] = df.nunique()
    78→profile['dtype'] = df.dtypes
    79→```
    80→
    81→For text columns in medium data, don't try NLP on every row. Instead:
    82→- Get value counts for categorical text (names, categories, statuses)
    83→- Sample 100-200 rows for free-text fields and scan for patterns manually
    84→- Use string operations (length, contains, regex) as cheap first filters
    85→
    86→## Working with Small Data (<100K rows)
    87→
    88→You can read everything. Do it. The danger here is under-analyzing, not
    89→over-loading. Run a thorough pass:
    90→
    91→1. Full descriptive statistics on every column
    92→2. Cross-tabulations of categorical fields
    93→3. Correlation matrix for numeric fields
    94→4. Duplicate detection (full-row and key-field)
    95→5. Temporal analysis if dates exist
    96→
    97→## Working with Document Collections
    98→
    99→**Critical: Do not skip document sources.** Agents consistently deprioritize PDFs and
   100→semi-structured documents in favor of clean CSVs and spreadsheets. This is a known
   101→failure mode. PDFs and document collections often contain entity-level detail,
   102→methodological context, manufacturer or product names, narrative explanations, and
   103→other information that structured data lacks. You must make a genuine extraction attempt
   104→on every document source provided.
   105→
   106→For PDFs, filings, or text documents:
   107→
   108→1. **Inventory and classify first.** How many documents? What types? Any metadata
   109→   (dates, sources, categories)? List every document explicitly.
   110→2. **Attempt extraction on all documents, not just a sample.** For PDFs:
   111→   - Try text extraction first (pdftotext, PyMuPDF, pdfplumber). Many "difficult"
   112→     PDFs actually contain extractable text.
   113→   - For tabular PDFs, use tabula-py or pdfplumber's table extraction.
   114→   - For scanned/image PDFs, note the limitation explicitly but still attempt OCR
   115→     if tools are available.
   116→   - If extraction partially fails, work with what you got. Partial data from 16 PDFs
   117→     is better than perfect data from 0 PDFs.
   118→   - Log extraction quality for each document (clean, partial, failed) in your
   119→     coverage notes.
   120→3. **Build a structured dataset from extracted fields.** Dates, names, dollar amounts,
   121→   addresses — anything that appears consistently across documents. This mini-dataset
   122→   becomes analyzable alongside your structured sources.
   123→4. **Sample for close reading.** Pick documents strategically (largest, smallest,
   124→   newest, oldest, different categories) and read them carefully. Note any patterns
   125→   in structure, language, or content.
   126→5. **Use text search for hypothesis testing.** Once you have a hunch from sampling,
   127→   grep/search across the full collection to see if the pattern holds.
   128→6. **Cross-reference with structured data.** The most valuable analysis often comes from
   129→   connecting document-extracted information to structured data — e.g., matching
   130→   manufacturer names from PDFs to export volumes in CSVs.
   131→
   132→## Technique Catalog
   133→
   134→### Outlier Detection
   135→
   136→**Numeric outliers:**
   137→- IQR method: values below Q1-1.5*IQR or above Q3+1.5*IQR
   138→- Z-score: values >3 standard deviations from mean (but skewed data breaks this)
   139→- For skewed distributions (very common in financial/government data): use log
   140→  transform first, or use percentile-based thresholds (e.g., top/bottom 1%)
   141→- Domain-specific thresholds are often more useful than statistical ones (e.g.,
   142→  contracts just below the bid threshold)
   143→
   144→**Categorical outliers:**
   145→- Values that appear only once or twice when most values appear hundreds of times
   146→- Categories with unusual distributions on other fields
   147→- Misspellings and near-duplicates (Levenshtein distance) that might be the same
   148→  entity under different names
   149→
   150→### Concentration Analysis
   151→
   152→- Herfindahl-Hirschman Index (HHI) for market/allocation concentration
   153→- Gini coefficient for distribution inequality
   154→- Simple share analysis: what % of total does the top entity/category represent?
   155→  Compare to what you'd expect if distribution were uniform.
   156→- For geographic concentration: counts by geography normalized by population or
   157→  another appropriate denominator
   158→
   159→### Threshold Analysis
   160→
   161→Suspicious patterns near known thresholds:
   162→- Values clustering just below reporting thresholds (structuring)
   163→- Values that are exact round numbers at unusual frequency
   164→- Patterns of splitting (e.g., multiple transactions to the same entity
   165→  that individually fall below a threshold but collectively exceed it)
   166→
   167→Approach:
   168→```sql
   169→-- Histogram around a threshold (e.g., $10,000)
   170→SELECT
   171→    width_bucket(amount, 8000, 12000, 40) as bucket,
   172→    count(*) as n
   173→FROM transactions
   174→GROUP BY 1
   175→ORDER BY 1
   176→```
   177→Look for a spike just below and a dip just above.
   178→
   179→### Temporal Analysis
   180→
   181→**Start with the macro trend.** Before looking for spikes and anomalies, compute the
   182→overall trajectory of the main metric (total volume, approval rate, dollar amount, count)
   183→across the full time range of the dataset. Table or plot it at an appropriate granularity.
   184→This is your first analytical act on any temporal data. The macro trend is often the lead
   185→itself — a decade-long collapse, a steady surge, a sudden plateau. Don't bury it.
   186→
   187→Once you have the big picture:
   188→- Compare period-over-period (this month vs. same month last year)
   189→- Look for activity at unusual times (weekends, holidays, after-hours)
   190→- Identify "regime changes" — sudden shifts in volume, composition, or behavior
   191→- Check for seasonality, and then look for deviations from the seasonal pattern
   192→- Look for compositional shifts within the trend: even if total volume is flat, the
   193→  mix of categories/entities/geographies may be changing significantly
   194→- Check for rank-order changes over time: who was the top entity at the start of the
   195→  data vs. the end? Rank reversals are often newsworthy even when the underlying
   196→  numbers aren't statistical outliers
   197→
   198→### Entity Resolution / Network Signals
   199→
   200→- Shared attributes across records: same address, phone, officer, registered agent
   201→- Look for entities that appear in multiple roles (e.g., both buyer and seller,
   202→  both contractor and inspector)
   203→- Cluster by shared attributes and look for unexpectedly large or densely connected
   204→  clusters
   205→- Name variations: sort alphabetically and scan for near-matches
   206→
   207→### Contextual Significance (beyond statistics)
   208→
   209→Statistical outlier detection (z-scores, IQR, percentile thresholds) finds what's
   210→numerically unusual. But journalistic significance often depends on context that isn't
   211→in the data. After your statistical passes, apply these lenses:
   212→
   213→**Policy timeline overlay:** Identify any known policy changes, legislation, regulations,
   214→or court decisions relevant to the dataset's domain. Map your temporal trends against
   215→these events. Patterns that start, stop, or inflect at policy boundaries are strong leads.
   216→Even if you don't know the specific policies, note inflection points and flag them as
   217→"something changed here — investigate what."
   218→
   219→**Rank-order and competitive analysis:** Compute rankings over time. Which entities were
   220→at the top at the start of the data, and which are at the top now? Rank reversals —
   221→especially among entities that a domain expert would consider peers or competitors — are
   222→frequently newsworthy even when the absolute numbers aren't statistical outliers.
   223→Guatemala surpassing Brazil as a destination isn't a z-score anomaly, but it's a story.
   224→
   225→**Geographic and demographic context:** Raw counts by geography are almost never the
   226→story. Normalize by population, income, housing stock, or whatever denominator fits the
   227→domain. But also consider: is the geographic distribution of this activity consistent with
   228→what domain knowledge would predict? A pattern concentrated in regions with no obvious
   229→connection to the activity is worth flagging.
   230→
   231→**Known sensitivity areas:** Some data patterns matter because of what they represent in
   232→the world, not because of their statistical properties. Apply what you know about the
   233→domain. Arms flows to conflict zones, lending patterns by race, environmental violations
   234→near vulnerable communities, safety incidents at facilities with prior histories — these
   235→deserve scrutiny proportional to their potential impact, even if the numbers are modest.
   236→
   237→**What's missing that should be there:** Domain knowledge helps you notice absences.
   238→A major player missing from an industry dataset. A state absent from a federal program.
   239→A category that should exist but doesn't. Statistical methods don't flag what isn't there.
   240→
   241→### Data Quality as Signal
   242→
   243→Sometimes the most interesting finding is about the data itself:
   244→- Mandatory fields that are blank for a specific subset of records
   245→- Date sequences with gaps (missing months, missing IDs in a sequence)
   246→- Values that change in format partway through (suggests a system change
   247→  or manual entry)
   248→- Duplicate records — are they true duplicates or records that shouldn't match?
   249→- Inconsistent categorization of the same entity
   250→
   251→### Cross-table / Cross-file Analysis
   252→
   253→When working with multiple related datasets:
   254→- Check referential integrity: do all foreign keys actually resolve?
   255→  Orphaned records are often interesting.
   256→- Compare overlapping fields across tables for consistency
   257→- Look for entities present in one table but absent in a related one
   258→  where you'd expect them
   259→- Temporal alignment: do the date ranges match? Are there periods covered
   260→  by one dataset but not the other?
   261→
   262→## Anti-patterns to Avoid
   263→
   264→- **The fishing expedition report.** Don't report every mildly unusual number.
   265→  Apply judgment — if you found 200 "anomalies," your threshold is too low.
   266→  Aim for 3-8 leads per tipsheet.
   267→
   268→- **The correlation dump.** "Column A correlates with Column B" is not a lead
   269→  unless you can explain why that's surprising or consequential.
   270→
   271→- **The denominator problem.** Big numbers mean nothing without context.
   272→  "$5M in contracts" is meaningless unless you know the total contract volume.
   273→  Always compute shares, rates, or per-capita figures.
   274→
   275→- **Survivorship bias.** You're analyzing what's in the data. Think about
   276→  what's NOT in the data that should be.
   277→
   278→- **Ignoring data generation.** Understand how the data was created. A "spike"
   279→  in records might just mean someone did a batch upload. A "gap" might mean the
   280→  system was down. Ask about provenance.
   281→
   282→- **Satisficing on structured data.** If the source material includes both CSVs and
   283→  PDFs, or both databases and document collections, you must analyze all of them.
   284→  Agents consistently gravitate toward the cleanest structured source, produce findings
   285→  from it, and then either skip or superficially acknowledge the rest. This misses entire
   286→  categories of leads (e.g., manufacturer-level analysis that only exists in PDFs, or
   287→  narrative context that explains a statistical pattern). Partial extraction from a hard
   288→  source is worth more than no extraction.
   289→
   290→- **Statistical significance without journalistic significance.** A z-score of 4.5 is
   291→  interesting, but "this country went from #3 to #1 in five years" might be a better
   292→  lead even if it doesn't register as a statistical outlier. Always ask: would a reporter
   293→  who covers this beat find this surprising or important? If you can't answer that
   294→  question, flag it as a question for the journalist rather than ignoring the pattern.
   295→
Bash
wc -l /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/*/Medicare_Provider_Util_Payment_PUF_CY*.txt
Read
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2015/Medicare-Physician-and-Other-Supplier-PUF-SAS-Infile.sas
Output
 9153274 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2012_update/Medicare_Provider_Util_Payment_PUF_CY2012.txt
 9287878 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2013/Medicare_Provider_Util_Payment_PUF_CY2013.txt
 9316309 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2014/Medicare_Provider_Util_Payment_PUF_CY2014.txt
 9497893 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2015/Medicare_Provider_Util_Payment_PUF_CY2015.txt
 37255354 total
Output
     1→
     2→DATA Medicare_PS_PUF;
     3→	LENGTH
     4→		npi              					$ 10
     5→		nppes_provider_last_org_name 		$ 70
     6→		nppes_provider_first_name 			$ 20
     7→		nppes_provider_mi					$ 1
     8→		nppes_credentials 					$ 20
     9→		nppes_provider_gender				$ 1
    10→		nppes_entity_code 					$ 1
    11→		nppes_provider_street1 				$ 55
    12→		nppes_provider_street2				$ 55
    13→		nppes_provider_city 				$ 40
    14→		nppes_provider_zip 					$ 20
    15→		nppes_provider_state				$ 2
    16→		nppes_provider_country				$ 2
    17→		provider_type 						$ 43
    18→		medicare_participation_indicator 	$ 1
    19→		place_of_service					$ 1
    20→		hcpcs_code       					$ 5
    21→		hcpcs_description 					$ 256
    22→		hcpcs_drug_indicator				$ 1
    23→		line_srvc_cnt      					8
    24→		bene_unique_cnt    					8
    25→		bene_day_srvc_cnt   				8
    26→		average_Medicare_allowed_amt   		8
    27→		average_submitted_chrg_amt  		8
    28→		average_Medicare_payment_amt   		8
    29→		average_Medicare_standard_amt		8;
    30→	INFILE 'C:\My Documents\Medicare_Provider_Util_Payment_PUF_CY2015.TXT'
    31→
    32→		lrecl=32767
    33→		dlm='09'x
    34→		pad missover
    35→		firstobs = 3
    36→		dsd;
    37→
    38→	INPUT
    39→		npi             
    40→		nppes_provider_last_org_name 
    41→		nppes_provider_first_name 
    42→		nppes_provider_mi 
    43→		nppes_credentials 
    44→		nppes_provider_gender 
    45→		nppes_entity_code 
    46→		nppes_provider_street1 
    47→		nppes_provider_street2 
    48→		nppes_provider_city 
    49→		nppes_provider_zip 
    50→		nppes_provider_state 
    51→		nppes_provider_country 
    52→		provider_type 
    53→		medicare_participation_indicator 
    54→		place_of_service 
    55→		hcpcs_code       
    56→		hcpcs_description 
    57→		hcpcs_drug_indicator
    58→		line_srvc_cnt    
    59→		bene_unique_cnt  
    60→		bene_day_srvc_cnt 
    61→		average_Medicare_allowed_amt 
    62→		average_submitted_chrg_amt 
    63→		average_Medicare_payment_amt
    64→		average_Medicare_standard_amt;
    65→
    66→	LABEL
    67→		npi     							= "National Provider Identifier"       
    68→		nppes_provider_last_org_name 		= "Last Name/Organization Name of the Provider"
    69→		nppes_provider_first_name 			= "First Name of the Provider"
    70→		nppes_provider_mi					= "Middle Initial of the Provider"
    71→		nppes_credentials 					= "Credentials of the Provider"
    72→		nppes_provider_gender 				= "Gender of the Provider"
    73→		nppes_entity_code 					= "Entity Type of the Provider"
    74→		nppes_provider_street1 				= "Street Address 1 of the Provider"
    75→		nppes_provider_street2 				= "Street Address 2 of the Provider"
    76→		nppes_provider_city 				= "City of the Provider"
    77→		nppes_provider_zip 					= "Zip Code of the Provider"
    78→		nppes_provider_state 				= "State Code of the Provider"
    79→		nppes_provider_country 				= "Country Code of the Provider"
    80→		provider_type	 					= "Provider Type of the Provider"
    81→		medicare_participation_indicator 	= "Medicare Participation Indicator"
    82→		place_of_service 					= "Place of Service"
    83→		hcpcs_code       					= "HCPCS Code"
    84→		hcpcs_description 					= "HCPCS Description"
    85→		hcpcs_drug_indicator				= "Identifies HCPCS As Drug Included in the ASP Drug List"
    86→		line_srvc_cnt    					= "Number of Services"
    87→		bene_unique_cnt  					= "Number of Medicare Beneficiaries"
    88→		bene_day_srvc_cnt 					= "Number of Distinct Medicare Beneficiary/Per Day Services"
    89→		average_Medicare_allowed_amt 		= "Average Medicare Allowed Amount"
    90→		average_submitted_chrg_amt 			= "Average Submitted Charge Amount"
    91→		average_Medicare_payment_amt 		= "Average Medicare Payment Amount"
    92→		average_Medicare_standard_amt		= "Average Medicare Standardized Payment Amount";
    93→RUN;
    94→
Thinking...
Now I have a clear picture: - 4 years of data (2012-2015), ~9.2-9.5 million rows each, ~37 million rows total - Tab-delimited, 26 columns per row - Each row = one provider-procedure combination - This is LARGE data - must use DuckDB Let me also check the PDF license files, and then build a comprehensive analysis using DuckDB. Let me plan my analysis: 1. PDF license files - just CMS/AMA CPT license agreements, not analytically interesting 2. SAS infiles - already read, just format specs 3. The main .txt files are the core data I need to analyze with DuckDB I'll structure this in phases: first profile the data to understand its shape and quality, then look at macro trends across the four-year span to identify shifts in billing patterns, detect outliers like top billers and unusual markup ratios with geographic breakdowns, and finally develop actionable leads from the findings. Let me write the DuckDB analysis script now.
Claude

Reconnaissance summary:

  • 37.3 million rows across 4 years (2012-2015), ~9M+ rows each
  • 26 columns: provider identity, location, specialty, procedure codes, service counts, and payment amounts
  • Tab-delimited text files, ~2GB each — must use DuckDB, not pandas
  • Other files: CPT license agreements (PDF) and SAS format specs — not analytically relevant

Let me now run a systematic analysis using DuckDB.

Bash
mkdir -p /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare
Output
(empty)
Write
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/01_profile_and_macro.py
File content
"""
Phase 1-2: Profile the Medicare data and compute macro trends using DuckDB.
"""
import duckdb

con = duckdb.connect()

BASE = "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare"
files = {
    2012: f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY2012_update/Medicare_Provider_Util_Payment_PUF_CY2012.txt",
    2013: f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY2013/Medicare_Provider_Util_Payment_PUF_CY2013.txt",
    2014: f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY2014/Medicare_Provider_Util_Payment_PUF_CY2014.txt",
    2015: f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY2015/Medicare_Provider_Util_Payment_PUF_CY2015.txt",
}

# Create a unified view with year column
# Skip row 2 which is a copyright notice (npi = '0000000001')
for year, path in files.items():
    con.execute(f"""
        CREATE OR REPLACE VIEW medicare_{year} AS
        SELECT *, {year} as year
        FROM read_csv('{path}',
            delim='\t', header=true,
            ignore_errors=true,
            all_varchar=false,
            columns={{
                'npi': 'VARCHAR',
                'nppes_provider_last_org_name': 'VARCHAR',
                'nppes_provider_first_name': 'VARCHAR',
                'nppes_provider_mi': 'VARCHAR',
                'nppes_credentials': 'VARCHAR',
                'nppes_provider_gender': 'VARCHAR',
                'nppes_entity_code': 'VARCHAR',
                'nppes_provider_street1': 'VARCHAR',
                'nppes_provider_street2': 'VARCHAR',
                'nppes_provider_city': 'VARCHAR',
                'nppes_provider_zip': 'VARCHAR',
                'nppes_provider_state': 'VARCHAR',
                'nppes_provider_country': 'VARCHAR',
                'provider_type': 'VARCHAR',
                'medicare_participation_indicator': 'VARCHAR',
                'place_of_service': 'VARCHAR',
                'hcpcs_code': 'VARCHAR',
                'hcpcs_description': 'VARCHAR',
                'hcpcs_drug_indi...
Output
File created successfully at: /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/01_profile_and_macro.py
Bash
uv run --with duckdb outputs/medicare/01_profile_and_macro.py 2>&1
Error
Exit code 1
Downloading duckdb (13.6MiB)
 Downloaded duckdb
Installed 1 package in 4ms
Traceback (most recent call last):
  File "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/01_profile_and_macro.py", line 19, in <module>
    con.execute(f"""
    ~~~~~~~~~~~^^^^^
        CREATE OR REPLACE VIEW medicare_{year} AS
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<33 lines>...
        WHERE npi != '0000000001'
        ^^^^^^^^^^^^^^^^^^^^^^^^^
    """)
    ^^^^
_duckdb.InvalidInputException: Invalid Input Error: Error when sniffing file "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2012_update/Medicare_Provider_Util_Payment_PUF_CY2012.txt".
It was not possible to automatically detect the CSV parsing dialect
The search space used was:
Delimiter Candidates: '	'
Quote/Escape Candidates: ['(no quote)','(no escape)'],['"','(no escape)'],['"','"'],['"','''],['"','\'],[''','(no escape)'],[''','''],[''','"'],[''','\']
Comment Candidates: '\0', '#'
Encoding: utf-8
Possible fixes:
* Disable the parser's strict mode (strict_mode=false) to allow reading rows that do not comply with the CSV standard.
* Columns are set as: "columns = { 'npi' : 'VARCHAR', 'nppes_provider_last_org_name' : 'VARCHAR', 'nppes_provider_first_name' : 'VARCHAR', 'nppes_provider_mi' : 'VARCHAR', 'nppes_credentials' : 'VARCHAR', 'nppes_provider_gender' : 'VARCHAR', 'nppes_entity_code' : 'VARCHAR', 'nppes_provider_street1' : 'VARCHAR', 'nppes_provider_street2' : 'VARCHAR', 'nppes_provider_city' : 'VARCHAR', 'nppes_provider_zip' : 'VARCHAR', 'nppes_provider_state' : 'VARCHAR', 'nppes_provider_country' : 'VARCHAR', 'provider_type' : 'VARCHAR', 'medicare_participation_indicator' : 'VARCHAR', 'place_of_service' : 'VARCHAR', 'hcpcs_code' : 'VARCHAR', 'hcpcs_description' : 'VARCHAR', 'hcpcs_drug_indicator' : 'VARCHAR', 'line_srvc_cnt' : 'DOUBLE', 'bene_unique_cnt' : 'DOUBLE', 'bene_day_srvc_cnt' : 'DOUBLE', 'average_Medicare_allowed_amt' : 'DOUBLE', 'average_submitted_chrg_amt' : 'DOUBLE', 'average_Medicare_payment_amt' : 'DOUBLE', 'average_Medicare_standard_amt' : 'DOUBLE'}", and they contain: 26 columns. It does not match the number of columns found by the sniffer: 28. Verify the columns parameter is correctly set.
* Make sure you are using the correct file encoding. If not, set it (e.g., encoding = 'utf-16').
* Delimiter is set to '	'. Consider unsetting it.
* Set quote (e.g., quote='"')
* Set escape (e.g., escape='"')
* Set comment (e.g., comment='#')
* Set skip (skip=${n}) to skip ${n} lines at the top of the file
* Enable null padding (null_padding=true) to pad missing columns with NULL values
* Check you are using the correct file compression, otherwise set it (e.g., compression = 'zstd')
* Be sure that the maximum line size is set to an appropriate value, otherwise set it (e.g., max_line_size=10000000)


LINE 4:         FROM read_csv('/Users/nrh146/Documents/agentic-investigation...
                     ^
Thinking...
The sniffer found 28 columns but our schema has 26. Let me check the actual header more carefully.
Bash
head -1 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2012_update/Medicare_Provider_Util_Payment_PUF_CY2012.txt | tr '\t' '\n' | nl
Output
     1	NPI
     2	NPPES_PROVIDER_LAST_ORG_NAME
     3	NPPES_PROVIDER_FIRST_NAME
     4	NPPES_PROVIDER_MI
     5	NPPES_CREDENTIALS
     6	NPPES_PROVIDER_GENDER
     7	NPPES_ENTITY_CODE
     8	NPPES_PROVIDER_STREET1
     9	NPPES_PROVIDER_STREET2
    10	NPPES_PROVIDER_CITY
    11	NPPES_PROVIDER_ZIP
    12	NPPES_PROVIDER_STATE
    13	NPPES_PROVIDER_COUNTRY
    14	PROVIDER_TYPE
    15	MEDICARE_PARTICIPATION_INDICATOR
    16	PLACE_OF_SERVICE
    17	HCPCS_CODE
    18	HCPCS_DESCRIPTION
    19	HCPCS_DRUG_INDICATOR
    20	LINE_SRVC_CNT
    21	BENE_UNIQUE_CNT
    22	BENE_DAY_SRVC_CNT
    23	AVERAGE_MEDICARE_ALLOWED_AMT
    24	STDEV_MEDICARE_ALLOWED_AMT
    25	AVERAGE_SUBMITTED_CHRG_AMT
    26	STDEV_SUBMITTED_CHRG_AMT
    27	AVERAGE_MEDICARE_PAYMENT_AMT
    28	STDEV_MEDICARE_PAYMENT_AMT
Bash
head -1 /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/Medicare_Provider_Util_Payment_PUF_CY2015/Medicare_Provider_Util_Payment_PUF_CY2015.txt | tr '\t' '\n' | nl
Output
     1	npi
     2	nppes_provider_last_org_name
     3	nppes_provider_first_name
     4	nppes_provider_mi
     5	nppes_credentials
     6	nppes_provider_gender
     7	nppes_entity_code
     8	nppes_provider_street1
     9	nppes_provider_street2
    10	nppes_provider_city
    11	nppes_provider_zip
    12	nppes_provider_state
    13	nppes_provider_country
    14	provider_type
    15	medicare_participation_indicator
    16	place_of_service
    17	hcpcs_code
    18	hcpcs_description
    19	hcpcs_drug_indicator
    20	line_srvc_cnt
    21	bene_unique_cnt
    22	bene_day_srvc_cnt
    23	average_Medicare_allowed_amt
    24	average_submitted_chrg_amt
    25	average_Medicare_payment_amt
    26	average_Medicare_standard_amt
Thinking...
The 2012 file has 28 columns (with STDEV columns and uppercase headers), while 2015 has 26 columns (with average_Medicare_standard_amt and lowercase headers). The schemas differ between years. Let me also check 2013 and 2014.
Bash
for f in /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare/*/Medicare_Provider_Util_Payment_PUF_CY*.txt; do echo "=== $(basename $(dirname $f)) ==="; head -1 "$f" | tr '\t' '\n' | wc -l; done
Output
=== Medicare_Provider_Util_Payment_PUF_CY2012_update ===
      28
=== Medicare_Provider_Util_Payment_PUF_CY2013 ===
      28
=== Medicare_Provider_Util_Payment_PUF_CY2014 ===
      26
=== Medicare_Provider_Util_Payment_PUF_CY2015 ===
      26
Claude

2012-2013 have 28 columns (with STDEV columns, no standard_amt), 2014-2015 have 26 columns (with standard_amt, no STDEVs). Let me fix the script to handle both schemas.

Write
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/01_profile_and_macro.py
File content
"""
Phase 1-2: Profile the Medicare data and compute macro trends using DuckDB.
Handles schema differences: 2012-2013 have STDEV cols, 2014-2015 have standard_amt col.
"""
import duckdb

con = duckdb.connect()

BASE = "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare"

# 2012-2013: uppercase headers, 28 cols (includes STDEV cols, no standard_amt)
for year in [2012, 2013]:
    suffix = "_update" if year == 2012 else ""
    path = f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY{year}{suffix}/Medicare_Provider_Util_Payment_PUF_CY{year}.txt"
    con.execute(f"""
        CREATE OR REPLACE VIEW medicare_{year} AS
        SELECT
            NPI as npi,
            NPPES_PROVIDER_LAST_ORG_NAME as nppes_provider_last_org_name,
            NPPES_PROVIDER_FIRST_NAME as nppes_provider_first_name,
            NPPES_PROVIDER_MI as nppes_provider_mi,
            NPPES_CREDENTIALS as nppes_credentials,
            NPPES_PROVIDER_GENDER as nppes_provider_gender,
            NPPES_ENTITY_CODE as nppes_entity_code,
            NPPES_PROVIDER_STREET1 as nppes_provider_street1,
            NPPES_PROVIDER_STREET2 as nppes_provider_street2,
            NPPES_PROVIDER_CITY as nppes_provider_city,
            NPPES_PROVIDER_ZIP as nppes_provider_zip,
            NPPES_PROVIDER_STATE as nppes_provider_state,
            NPPES_PROVIDER_COUNTRY as nppes_provider_country,
            PROVIDER_TYPE as provider_type,
            MEDICARE_PARTICIPATION_INDICATOR as medicare_participation_indicator,
            PLACE_OF_SERVICE as place_of_service,
            HCPCS_CODE as hcpcs_code,
            HCPCS_DESCRIPTION as hcpcs_description,
            HCPCS_DRUG_INDICATOR as hcpcs_drug_indicator,
            LINE_SRVC_CNT as line_srvc_cnt,
            BENE_UNIQUE_CNT as bene_unique_cnt,
            BENE_DAY_SRVC_CNT as bene_day_srvc_cnt,
            AVERAGE_MEDICARE_ALLOWED_AMT as average_Medicare_allowed_amt,
            AVERAGE_SUBMITTED_CHRG_AMT as average_subm...
Output
The file /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/01_profile_and_macro.py has been updated successfully.
Bash
uv run --with duckdb --with pandas outputs/medicare/01_profile_and_macro.py 2>&1
Output
Downloading numpy (5.0MiB)
 Downloaded numpy
Installed 5 packages in 70ms
================================================================================
BASIC PROFILING
================================================================================

--- Row counts by year ---
   year     rows  unique_providers  unique_procedures  unique_states
0  2012  9153272            880644               5949             61
1  2013  9287876            909605               5983             61
2  2014  9316307            938146               5972             61
3  2015  9497891            968417               5983             61

--- Entity types (I=individual, O=organization) ---
   year nppes_entity_code      cnt  unique_npis
0  2012                 I  8773306       825112
1  2012                 O   379966        55532
2  2013                 I  8882031       852026
3  2013                 O   405845        57579
4  2014                 I  8913180       877292
5  2014                 O   403127        60854
6  2015                 I  9074050       906397
7  2015                 O   423841        62020

================================================================================
MACRO TRENDS (2012-2015)
================================================================================

--- Total Medicare payments and services by year ---
   year     rows  total_services  total_medicare_payments  total_submitted_charges  total_allowed_amt  total_bene_encounters
0  2012  9153272    2.194068e+09             7.739405e+10             2.523995e+11       9.931592e+10            834770973.0
1  2013  9287876    2.189253e+09             7.679910e+10             2.624787e+11       1.002172e+11            842761312.0
2  2014  9316307    2.270577e+09             7.822077e+10             2.717542e+11       1.020282e+11            848335652.0
3  2015  9497891    2.359294e+09             8.062372e+10             2.864149e+11       1.048229e+11            862558384.0

--- Average markup ratio (submitted / allowed) by year ---
   year  markup_ratio  payment_to_allowed_ratio
0  2012         2.541                     0.779
1  2013         2.619                     0.766
2  2014         2.664                     0.767
3  2015         2.732                     0.769

--- Top 15 provider types by total Medicare payments (2015) ---
                 provider_type  num_providers  total_services  total_payments  payment_per_provider
0            Internal Medicine          96605     195218378.0    8.860412e+09               91718.0
1                Ophthalmology          17342      56182289.0    6.151379e+09              354710.0
2          Clinical Laboratory           2862     352136209.0    4.965150e+09             1734853.0
3   Ambulance Service Supplier           9680     161648810.0    4.905119e+09              506727.0
4              Family Practice          80928     124335259.0    4.671160e+09               57720.0
5                   Cardiology          20656      76699314.0    4.208689e+09              203751.0
6         Diagnostic Radiology          28851     157989453.0    3.243601e+09              112426.0
7   Ambulatory Surgical Center           4794       6266788.0    2.618131e+09              546127.0
8          Hematology/Oncology           7825     239279855.0    2.509874e+09              320751.0
9                  Dermatology          11108      44303676.0    2.322611e+09              209094.0
10          Emergency Medicine          40634      28383771.0    2.315826e+09               56992.0
11          Orthopedic Surgery          20742      40182934.0    2.102680e+09              101373.0
12          Physical Therapist          43826      97226871.0    2.001104e+09               45660.0
13          Nurse Practitioner          78791      49727688.0    1.847692e+09               23451.0
14                  Nephrology           8081      32114085.0    1.695260e+09              209783.0

--- Provider type growth: biggest pct changes 2012 to 2015 ---
                              provider_type      pay_2012      pay_2015  dollar_change  pct_change
0           Licensed Clinical Social Worker  1.278266e+08  2.375508e+08    109724211.0        85.8
1                           Pain Management  1.671015e+08  3.051395e+08    138038017.0        82.6
2                        Nurse Practitioner  1.112003e+09  1.847692e+09    735689562.0        66.2
3                 Cardiac Electrophysiology  2.040099e+08  3.161324e+08    112122402.0        55.0
4                       Physician Assistant  7.652967e+08  1.110642e+09    345345016.0        45.1
5                     Clinical Psychologist  3.066180e+08  4.392329e+08    132614948.0        43.3
6                  Interventional Radiology  1.462294e+08  1.952640e+08     49034551.0        33.5
7           Mass Immunization Roster Biller  2.139696e+08  2.701042e+08     56134565.0        26.2
8                              Rheumatology  1.044462e+09  1.248153e+09    203690925.0        19.5
9                                Psychiatry  7.422675e+08  8.698778e+08    127610231.0        17.2
10                                     CRNA  3.796383e+08  4.414333e+08     61794981.0        16.3
11                       Allergy/Immunology  1.645634e+08  1.908748e+08     26311411.0        16.0
12                         Vascular Surgery  4.853315e+08  5.592332e+08     73901770.0        15.2
13                      All Other Suppliers  1.035373e+08  1.188180e+08     15280766.0        14.8
14                                Optometry  6.800570e+08  7.554012e+08     75344241.0        11.1
15                       Physical Therapist  1.804842e+09  2.001104e+09    196262387.0        10.9
16               Ambulatory Surgical Center  2.373509e+09  2.618131e+09    244622200.0        10.3
17                            Ophthalmology  5.585010e+09  6.151379e+09    566368326.0        10.1
18             Critical Care (Intensivists)  2.083829e+08  2.291674e+08     20784531.0        10.0
19                             Neurosurgery  3.108855e+08  3.404819e+08     29596440.0         9.5
20                           Otolaryngology  6.427760e+08  6.991346e+08     56358572.0         8.8
21                           Anesthesiology  9.116801e+08  9.875128e+08     75832720.0         8.3
22                          Family Practice  4.372944e+09  4.671160e+09    298216215.0         6.8
23                             Chiropractic  4.952225e+08  5.247208e+08     29498281.0         6.0
24                       Emergency Medicine  2.187650e+09  2.315826e+09    128176089.0         5.9
25                   Occupational therapist  1.367549e+08  1.429910e+08      6236065.0         4.6
26                         Thoracic Surgery  1.514097e+08  1.576463e+08      6236611.0         4.1
27                            Endocrinology  3.872912e+08  4.024885e+08     15197335.0         3.9
28                              Dermatology  2.235388e+09  2.322611e+09     87223044.0         3.9
29                       Infectious Disease  5.233632e+08  5.430487e+08     19685508.0         3.8
30                      Clinical Laboratory  4.824116e+09  4.965150e+09    141033511.0         2.9
31     Physical Medicine and Rehabilitation  8.398509e+08  8.596767e+08     19825830.0         2.4
32                        Internal Medicine  8.736311e+09  8.860412e+09    124100604.0         1.4
33                               Nephrology  1.685607e+09  1.695260e+09      9652785.0         0.6
34               Ambulance Service Supplier  4.916023e+09  4.905119e+09    -10903809.0        -0.2
35                        Pulmonary Disease  1.356833e+09  1.345555e+09    -11278061.0        -0.8
36       Plastic and Reconstructive Surgery  1.334712e+08  1.324548e+08     -1016438.0        -0.8
37                       Orthopedic Surgery  2.121360e+09  2.102680e+09    -18679971.0        -0.9
38           Interventional Pain Management  3.660566e+08  3.597463e+08     -6310345.0        -1.7
39  Independent Diagnostic Testing Facility  7.166623e+08  7.027621e+08    -13900269.0        -1.9
40                                 Podiatry  1.331632e+09  1.304216e+09    -27415429.0        -2.1
41                                Neurology  1.270178e+09  1.232737e+09    -37441595.0        -2.9
42                           Portable X-ray  2.482643e+08  2.393527e+08     -8911622.0        -3.6
43                          General Surgery  9.046800e+08  8.649767e+08    -39703269.0        -4.4
44                     Diagnostic Radiology  3.408879e+09  3.243601e+09   -165278182.0        -4.8
45                       Geriatric Medicine  1.718803e+08  1.634860e+08     -8394379.0        -4.9
46                       Radiation Oncology  1.499626e+09  1.413984e+09    -85641764.0        -5.7
47                         Gastroenterology  1.344336e+09  1.259962e+09    -84373572.0        -6.3
48                    Obstetrics/Gynecology  3.145407e+08  2.943138e+08    -20226875.0        -6.4
49                                  Urology  1.385385e+09  1.289390e+09    -95995872.0        -6.9
50                      Hematology/Oncology  2.703879e+09  2.509874e+09   -194005764.0        -7.2
51                         Medical Oncology  8.066378e+08  7.352141e+08    -71423621.0        -8.9
52                                Pathology  9.965194e+08  8.947638e+08   -101755530.0       -10.2
53                          Cardiac Surgery  1.796168e+08  1.563264e+08    -23290479.0       -13.0
54                               Cardiology  4.965537e+09  4.208689e+09   -756847848.0       -15.2
55                         General Practice  3.742832e+08  3.000142e+08    -74269046.0       -19.8
56                               Hematology  1.276151e+08  9.386029e+07    -33754759.0       -26.5

--- Drug vs non-drug spending trends ---
   year hcpcs_drug_indicator  total_payments  providers  procedures
0  2012                    N    7.199174e+10     880558        5577
1  2012                    Y    5.402309e+09     216900         372
2  2013                    N    7.094365e+10     909521        5622
3  2013                    Y    5.855450e+09     222352         361
4  2014                    N    7.201231e+10     938049        5614
5  2014                    Y    6.208463e+09     226235         358
6  2015                    N    7.319996e+10     968276        5625
7  2015                    Y    7.423759e+09     233325         358

--- Top 15 states by per-provider Medicare payments (2015) ---
   nppes_provider_state  providers  total_payments  per_provider
0                    FL      59326    7.659076e+09      129102.0
1                    NJ      30365    3.709492e+09      122163.0
2                    AL      13672    1.629608e+09      119193.0
3                    NV       6359    7.317890e+08      115079.0
4                    CA      76948    8.347249e+09      108479.0
5                    MD      20294    2.111904e+09      104065.0
6                    SC      14096    1.446544e+09      102621.0
7                    AR       7961    8.094994e+08      101683.0
8                    DE       3577    3.593472e+08      100460.0
9                    MS       7867    7.761385e+08       98657.0
10                   AZ      18755    1.788569e+09       95365.0
11                   TX      60933    5.756052e+09       94465.0
12                   GA      26165    2.357113e+09       90086.0
13                   VA      23695    2.116261e+09       89313.0
14                   TN      22383    1.984557e+09       88664.0

--- Top 20 HCPCS codes by total payment (2015) ---
   hcpcs_code                                                                                       hcpcs_description hcpcs_drug_indicator  providers  total_services  total_payments
0       99214                              Established patient office or other outpatient, visit typically 25 minutes                    N     377277      94678344.0    6.836723e+09
1       99213                              Established patient office or other outpatient visit, typically 15 minutes                    N     422117      98711610.0    4.753546e+09
2       99232                                        Subsequent hospital inpatient care, typically 25 minutes per day                    N     170142      48548303.0    2.746183e+09
3       99233                                        Subsequent hospital inpatient care, typically 35 minutes per day                    N     114129      21669485.0    1.778690e+09
4       A0427                 Ambulance service, advanced life support, emergency transport, level 1 (als1-emergency)                    N       7210       4987475.0    1.657162e+09
5       99223                                           Initial hospital inpatient care, typically 70 minutes per day                    N     131331      10469913.0    1.654690e+09
6       66984                                                              Removal of cataract with insertion of lens                    N      16723       7193244.0    1.653606e+09
7       99285                         Emergency department visit, problem with significant threat to life or function                    N      51936      11214123.0    1.503987e+09
8       J0178                                                                            Injection, aflibercept, 1 mg                    Y       2360       1749677.0    1.335266e+09
9       97110  Therapeutic exercise to develop strength, endurance, range of motion, and flexibility, each 15 minutes                    N      49699      48424751.0    1.018743e+09
10      99204                                      New patient office or other outpatient visit, typically 45 minutes                    N     164269       8724691.0    1.003872e+09
11      A0428                                   Ambulance service, basic life support, non-emergency transport, (bls)                    N       3263       5802832.0    9.804446e+08
12      92014          Eye and medical examination for diagnosis and treatment, established patient, 1 or more visits                    N      35933      11820085.0    9.754617e+08
13      99291                           Critical care delivery critically ill or injured patient, first 30-74 minutes                    N      58819       5050586.0    8.941969e+08
14      99215                              Established patient office or other outpatient, visit typically 40 minutes                    N     108208       8732713.0    8.770550e+08
15      A0425                                                                        Ground mileage, per statute mile                    N       9355     141108693.0    8.703474e+08
16      J2778                                                                          Injection, ranibizumab, 0.1 mg                    Y       1587       2798953.0    8.592801e+08
17      A0429                              Ambulance service, basic life support, emergency transport (bls-emergency)                    N       8235       2913871.0    8.277199e+08
18      90670                                                          Pneumococcal vaccine for injection into muscle                    Y      77468       4867462.0    8.002769e+08
19      88305                             Pathology examination of tissue using a microscope, intermediate complexity                    N      15184      18981514.0    7.334256e+08

--- Gender distribution trends ---
   year nppes_provider_gender  unique_providers
0  2012                     F            302008
1  2012                     M            523104
2  2013                     F            323056
3  2013                     M            528970
4  2014                     F            344121
5  2014                     M            533171
6  2015                     F            368147
7  2015                     M            538250
Thinking...
Good data. Now let me run the signal detection phase - looking for outliers, concentrations, and specific newsworthy patterns. Let me run multiple analyses in parallel.
Claude

Excellent — the macro picture is clear. Now let me dig into outliers, concentrations, and specific investigative leads.

Write
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/02_outliers_and_leads.py
File content
"""
Phase 2-3: Outlier detection, concentration analysis, and lead development.
"""
import duckdb

con = duckdb.connect()

BASE = "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare"

# Set up views (same as before)
for year in [2012, 2013]:
    suffix = "_update" if year == 2012 else ""
    path = f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY{year}{suffix}/Medicare_Provider_Util_Payment_PUF_CY{year}.txt"
    con.execute(f"""
        CREATE OR REPLACE VIEW medicare_{year} AS
        SELECT
            NPI as npi, NPPES_PROVIDER_LAST_ORG_NAME as nppes_provider_last_org_name,
            NPPES_PROVIDER_FIRST_NAME as nppes_provider_first_name,
            NPPES_PROVIDER_GENDER as nppes_provider_gender,
            NPPES_ENTITY_CODE as nppes_entity_code,
            NPPES_PROVIDER_CITY as nppes_provider_city,
            NPPES_PROVIDER_ZIP as nppes_provider_zip,
            NPPES_PROVIDER_STATE as nppes_provider_state,
            NPPES_PROVIDER_COUNTRY as nppes_provider_country,
            PROVIDER_TYPE as provider_type,
            MEDICARE_PARTICIPATION_INDICATOR as medicare_participation_indicator,
            PLACE_OF_SERVICE as place_of_service,
            HCPCS_CODE as hcpcs_code, HCPCS_DESCRIPTION as hcpcs_description,
            HCPCS_DRUG_INDICATOR as hcpcs_drug_indicator,
            LINE_SRVC_CNT as line_srvc_cnt, BENE_UNIQUE_CNT as bene_unique_cnt,
            BENE_DAY_SRVC_CNT as bene_day_srvc_cnt,
            AVERAGE_MEDICARE_ALLOWED_AMT as average_Medicare_allowed_amt,
            AVERAGE_SUBMITTED_CHRG_AMT as average_submitted_chrg_amt,
            AVERAGE_MEDICARE_PAYMENT_AMT as average_Medicare_payment_amt,
            {year} as year
        FROM read_csv('{path}', delim='\t', header=true, quote='"',
            ignore_errors=true, null_padding=true)
        WHERE NPI != '0000000001'
    """)

for year in [2014, 2015]:
    path = f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY{year}/Medicare_Provider_Util_Paymen...
Output
File created successfully at: /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/02_outliers_and_leads.py
Bash
uv run --with duckdb --with pandas outputs/medicare/02_outliers_and_leads.py 2>&1
Output
<persisted-output>
Output too large (40.4KB). Full output saved to: /Users/nrh146/.claude/projects/-Users-nrh146-Documents-agentic-investigation-newsworthiness-dataonly/b551ad9a-2f1f-4337-8e5d-1ef7962756d1/tool-results/buw5oscm1.txt

Preview (first 2KB):
================================================================================
LEAD 1: TOP 30 INDIVIDUAL PROVIDERS BY TOTAL MEDICARE PAYMENTS (2015)
================================================================================
           npi   last_name   first_name             provider_type state              city  total_payment  total_services  total_benes  num_procedures
0   1811931546     LAGANKE  CHRISTOPHER                 Neurology    AL           CULLMAN     13305836.0        510834.0       3254.0              33
1   1558329847    KUNIMOTO        DEREK             Ophthalmology    AZ           PHOENIX     11077451.0         54725.0       9711.0              22
2   1922021195       EATON    ALEXANDER             Ophthalmology    FL        FORT MYERS     10163018.0         60441.0      17232.0              41
3   1033116074       WELCH         JOHN             Ophthalmology    NE          HASTINGS      8938672.0         31452.0       8294.0              18
4   1275512691      MORICE      WILLIAM                 Pathology    MN         ROCHESTER      8871263.0        915979.0     605766.0             480
5   1760441760    MCGUCKIN        JAMES  Interventional Radiology    NJ             EWING      8623237.0          8813.0       4167.0              42
6   1821058496    MALHOTRA        VIKAS       Hematology/Oncology    FL       SPRING HILL      8306543.0        830224.0      16723.0             134
7   1124076930      KAISER      RICHARD             Ophthalmology    PA  PLYMOUTH MEETING      7843836.0         34561.0       6771.0              21
8   1124010913      SAGGAU        DAVID             Ophthalmology    IA   WEST DES MOINES      7751346.0         31574.0       7468.0              21
9   1437360351     ALLIMAN         KYLE             Ophthalmology    IA   WEST DES MOINES      7460843.0         29900.0       6986.0              22
...
</persisted-output>
Read
/Users/nrh146/.claude/projects/-Users-nrh146-Documents-agentic-investigation-newsworthiness-dataonly/b551ad9a-2f1f-4337-8e5d-1ef7962756d1/tool-results/buw5oscm1.txt
Output
     1→================================================================================
     2→LEAD 1: TOP 30 INDIVIDUAL PROVIDERS BY TOTAL MEDICARE PAYMENTS (2015)
     3→================================================================================
     4→           npi   last_name   first_name             provider_type state              city  total_payment  total_services  total_benes  num_procedures
     5→0   1811931546     LAGANKE  CHRISTOPHER                 Neurology    AL           CULLMAN     13305836.0        510834.0       3254.0              33
     6→1   1558329847    KUNIMOTO        DEREK             Ophthalmology    AZ           PHOENIX     11077451.0         54725.0       9711.0              22
     7→2   1922021195       EATON    ALEXANDER             Ophthalmology    FL        FORT MYERS     10163018.0         60441.0      17232.0              41
     8→3   1033116074       WELCH         JOHN             Ophthalmology    NE          HASTINGS      8938672.0         31452.0       8294.0              18
     9→4   1275512691      MORICE      WILLIAM                 Pathology    MN         ROCHESTER      8871263.0        915979.0     605766.0             480
    10→5   1760441760    MCGUCKIN        JAMES  Interventional Radiology    NJ             EWING      8623237.0          8813.0       4167.0              42
    11→6   1821058496    MALHOTRA        VIKAS       Hematology/Oncology    FL       SPRING HILL      8306543.0        830224.0      16723.0             134
    12→7   1124076930      KAISER      RICHARD             Ophthalmology    PA  PLYMOUTH MEETING      7843836.0         34561.0       6771.0              21
    13→8   1124010913      SAGGAU        DAVID             Ophthalmology    IA   WEST DES MOINES      7751346.0         31574.0       7468.0              21
    14→9   1437360351     ALLIMAN         KYLE             Ophthalmology    IA   WEST DES MOINES      7460843.0         29900.0       6986.0              22
    15→10  1467409383     NIELSEN        JARED             Ophthalmology    IA   WEST DES MOINES      7378290.0         27832.0       6738.0              27
    16→11  1205881976       BAUDO       THOMAS             Ophthalmology    FL        VERO BEACH      7068784.0         37584.0       9297.0              28
    17→12  1972550622    WAFAPOOR      HUSSEIN             Ophthalmology    FL        FORT MYERS      6942774.0         37374.0      10756.0              31
    18→13  1922056712          HO        ALLEN             Ophthalmology    PA  PLYMOUTH MEETING      6901381.0         29574.0       7085.0              19
    19→14  1063489383       GUPTA        SUNIL             Ophthalmology    FL         PENSACOLA      6886405.0         48248.0      12665.0              27
    20→15  1790765337      WAPLES         JOHN       Hematology/Oncology    AL        HUNTSVILLE      6664663.0        809068.0      18742.0             124
    21→16  1881661585       MYERS         JOHN             Ophthalmology    FL         PENSACOLA      6416737.0         47080.0      10442.0              29
    22→17  1588750178       BORNE      MICHAEL             Ophthalmology    MS           JACKSON      6396235.0         26912.0       9423.0              27
    23→18  1174507800       JONES      RICHARD              Rheumatology    AL        TUSCALOOSA      6239463.0        405106.0      20955.0              77
    24→19  1376588582         PAL       ASHISH                Cardiology    FL         DAVENPORT      6123176.0         22673.0       8709.0              67
    25→20  1043288673      CARSON        CRAIG              Rheumatology    OK            EDMOND      6079204.0        252959.0      13939.0              91
    26→21  1033101860      BARNES      CHARLES             Ophthalmology    IA      CEDAR RAPIDS      6071878.0         24911.0       4937.0              16
    27→22  1457325086       SPIRN         MARC             Ophthalmology    PA  PLYMOUTH MEETING      5952156.0         24678.0       5821.0              22
    28→23  1780632570      VANDER        JAMES             Ophthalmology    PA  PLYMOUTH MEETING      5941114.0         26611.0       6798.0              20
    29→24  1497726103      BOCHOW       THOMAS             Ophthalmology    TX             TYLER      5826777.0         28139.0       6688.0              31
    30→25  1982813028  VALACHOVIC      MICHAEL         Internal Medicine    TX            DALLAS      5798429.0        419752.0     300161.0             142
    31→26  1588733125    TIEDEMAN        JAMES             Ophthalmology    VA      FISHERSVILLE      5769589.0         24942.0       5382.0              16
    32→27  1437117116        TRAN          THI               Dermatology    FL      THE VILLAGES      5657495.0         68674.0      34719.0              47
    33→28  1740248038     REGILLO         CARL             Ophthalmology    PA  PLYMOUTH MEETING      5650683.0         27426.0       7142.0              21
    34→29  1477504405       CHANG     MARGARET             Ophthalmology    CA        SACRAMENTO      5629523.0         31390.0      10684.0              29
    35→
    36→================================================================================
    37→LEAD 2: DRUG SPENDING GROWTH — Top drugs by spending increase 2012-2015
    38→================================================================================
    39→   hcpcs_code                                                                                                                                                                 description     pay_2012     pay_2015  dollar_increase  pct_change    svc_2012    svc_2015
    40→0       90670                                                                                                                              Pneumococcal vaccine for injection into muscle    3424636.0  800276942.0      796852305.0     23268.2     25670.0   4867462.0
    41→1       J0897                                                                                                                                                  Injection, denosumab, 1 mg  177050958.0  356262781.0      179211823.0       101.2  15579115.0  30823205.0
    42→2       90662                                                                                                                             Vaccine for influenza for injection into muscle   72451110.0  236439832.0      163988722.0       226.3   2383966.0   6737677.0
    43→3       J0129  Injection, abatacept, 10 mg (code may be used for medicare when drug administered under the direct supervision of a physician, not for use when drug is self administered)  111279022.0  164910045.0       53631023.0        48.2   6542956.0   6179292.0
    44→4       J1745                                                                                                                                                 Injection infliximab, 10 mg  447835631.0  496352664.0       48517032.0        10.8   8974514.0   8456030.0
    45→5       J0585                                                                                                                                       Injection, onabotulinumtoxina, 1 unit   75314056.0  107069202.0       31755146.0        42.2  17597151.0  24799762.0
    46→6       J1568                                                                                   Injection, immune globulin, (octagam), intravenous, non-lyophilized (e.g. liquid), 500 mg   13451359.0   41750339.0       28298981.0       210.4    445626.0   1354937.0
    47→7       J2357                                                                                                                                                 Injection, omalizumab, 5 mg   22114646.0   47585985.0       25471338.0       115.2   1227506.0   2123047.0
    48→8       J3262                                                                                                                                                Injection, tocilizumab, 1 mg   18948688.0   42157724.0       23209036.0       122.5   6963360.0  14398020.0
    49→9       J9264                                                                                                                         Injection, paclitaxel protein-bound particles, 1 mg    8777239.0   30062098.0       21284860.0       242.5   1151996.0   3962862.0
    50→10      J9355                                                                                                                                               Injection, trastuzumab, 10 mg   30013119.0   49727723.0       19714604.0        65.7    513228.0    747324.0
    51→11      J7324                                                                                                Hyaluronan or derivative, orthovisc, for intra-articular injection, per dose   38846997.0   58460912.0       19613915.0        50.5    294457.0    435978.0
    52→12      J7308                                                                                   Aminolevulinic acid hcl for topical administration, 20%, single unit dosage form (354 mg)   13204654.0   28619256.0       15414602.0       116.7    109435.0    136941.0
    53→13      J7192                                                                                          Factor viii (antihemophilic factor, recombinant) per i.u., not otherwise specified   71161939.0   84847035.0       13685096.0        19.2    469278.0    218832.0
    54→14      J2323                                                                                                                                                Injection, natalizumab, 1 mg   33150622.0   46707359.0       13556737.0        40.9   4007357.0   3922231.0
    55→15      J7312                                                                                                                      Injection, dexamethasone, intravitreal implant, 0.1 mg    4874105.0   18062841.0       13188736.0       270.6     32377.0    116092.0
    56→16      J9041                                                                                                                                               Injection, bortezomib, 0.1 mg   51811432.0   61303243.0        9491811.0        18.3   1524712.0   1698966.0
    57→17      J7323                                                                                                 Hyaluronan or derivative, euflexxa, for intra-articular injection, per dose   30428186.0   38389905.0        7961719.0        26.2    255574.0    328418.0
    58→18      J0878                                                                                                                                                 Injection, daptomycin, 1 mg   14293051.0   19417423.0        5124372.0        35.9  34655377.0  34406625.0
    59→19      Q2043            Sipuleucel-t, minimum of 50 million autologous cd54+ cells activated with pap-gm-csf, including leukapheresis and all other preparatory procedures, per infusion   13636972.0   17282822.0        3645850.0        26.7       530.0       618.0
    60→20      J9070                                                                                                                                                    Cyclophosphamide, 100 mg    5161091.0    8171912.0        3010821.0        58.3    363195.0    192760.0
    61→21      J0600                                                                                                                          Injection, edetate calcium disodium, up to 1000 mg    1287127.0    4085722.0        2798595.0       217.4     13166.0     12515.0
    62→22      J0775                                                                                                                   Injection, collagenase, clostridium histolyticum, 0.01 mg    3746895.0    6212307.0        2465412.0        65.8    125715.0    208701.0
    63→23      J9395                                                                                                                                               Injection, fulvestrant, 25 mg    9532581.0   11786498.0        2253918.0        23.6    138988.0    164748.0
    64→24      Q0138                                                                                        Injection, ferumoxytol, for treatment of iron deficiency anemia, 1 mg (non-esrd use)   13306626.0   15352281.0        2045655.0        15.4  26369594.0  24862636.0
    65→
    66→================================================================================
    67→LEAD 3: HIGHEST MARKUP RATIOS — Providers with highest submitted/allowed (2015)
    68→(min $500K total payments, individual providers only)
    69→================================================================================
    70→           npi        last_name    first_name                         provider_type state  markup_ratio  total_payment  total_charges  total_services
    71→0   1700983707             GOEL        SANJIV                            Cardiology    CA          22.6       513758.0     15169107.0          7035.0
    72→1   1770513723              DEV        VISHVA                            Cardiology    CA          18.6       730237.0     17727265.0         18592.0
    73→2   1164637799       KRISHNARAJ          ARUN                  Diagnostic Radiology    VA          15.4       744011.0     14787496.0         98051.0
    74→3   1548363583           MADUKA        GODWIN                        Anesthesiology    NV          15.1       517327.0      9569317.0         15493.0
    75→4   1528053642            KLEIN         GREGG                    Orthopedic Surgery    NJ          12.7       606499.0     10247035.0          7125.0
    76→5   1619955994            PATEL      JASHVANT  Physical Medicine and Rehabilitation    CA          12.3       508596.0      7833378.0          5777.0
    77→6   1144214966        VASIREDDY       SRIDHAR                        Anesthesiology    TX          12.2       521892.0      7859960.0         12970.0
    78→7   1083674410              YUN        ANDREW                    Orthopedic Surgery    CA          11.8       946025.0     14391502.0          3595.0
    79→8   1982677910            COSTA       ANTHONY                    Orthopedic Surgery    NJ          11.8       660293.0     10206974.0         16896.0
    80→9   1629103759              RAO        SUDHIR        Interventional Pain Management    MD          11.7       500040.0      7491000.0          6492.0
    81→10  1053486704           WILLIS       KENNETH        Interventional Pain Management    AL          11.7      1099576.0     15868570.0         10631.0
    82→11  1710991070        ADIPIETRO         FRANK                       Pain Management    NY          11.4       545552.0      8020011.0          7434.0
    83→12  1962430413           LEVINE       MSONTHI                     Internal Medicine    TX          11.4       799088.0     12005096.0         14452.0
    84→13  1922001361           LEVINE        HARLAN                    Orthopedic Surgery    NJ          11.4       723508.0     10852512.0          6571.0
    85→14  1750350633           ZAYDAN  MUHAMMAD-ALI                   Hematology/Oncology    KY          11.2       799467.0     11442785.0        145139.0
    86→15  1952505398         PIZZURRO          MARK                    Orthopedic Surgery    NJ          11.1       720848.0     10495861.0         10528.0
    87→16  1588693675           SLEVIN        KIERAN                        Anesthesiology    NJ          10.8       560851.0      7504950.0          6942.0
    88→17  1780680496          SHOULTZ       CHARLES                            Cardiology    TX          10.7       598940.0      8576357.0         29043.0
    89→18  1578535217  VALENZUELA SOSA         FABIO                    Radiation Oncology    WI          10.7       615331.0      8416077.0          3784.0
    90→19  1427099985            MEHTA        MONICA  Physical Medicine and Rehabilitation    NJ          10.5       610691.0      8214515.0         26394.0
    91→20  1184631509           SAMADI         DAVID                               Urology    NY          10.5       582208.0      8068980.0          5090.0
    92→21  1134162019             KATZ       KENNETH                           Dermatology    WI          10.5       959735.0     13319930.0         35844.0
    93→22  1114977071           NASSIR        YOURAM                            Hematology    CA          10.4      1672131.0     21561225.0        149239.0
    94→23  1114089471          RASTOGI      ABHIJEET                        Anesthesiology    PA          10.3       753803.0      9483570.0         10050.0
    95→24  1932137650    GIANNAKOPOULO        GEORGE                             Neurology    FL           9.8       531854.0      6738457.0          5162.0
    96→
    97→--- Average markup ratio by provider type (2015, top 20 by volume) ---
    98→                 provider_type  avg_markup  total_payments
    99→0            Internal Medicine        2.01    8.860412e+09
   100→1                Ophthalmology        2.10    6.151379e+09
   101→2          Clinical Laboratory        3.77    4.965150e+09
   102→3   Ambulance Service Supplier        2.58    4.905119e+09
   103→4              Family Practice        1.92    4.671160e+09
   104→5                   Cardiology        2.63    4.208689e+09
   105→6         Diagnostic Radiology        3.96    3.243601e+09
   106→7   Ambulatory Surgical Center        4.76    2.618131e+09
   107→8          Hematology/Oncology        2.60    2.509874e+09
   108→9                  Dermatology        1.94    2.322611e+09
   109→10          Emergency Medicine        4.78    2.315826e+09
   110→11          Orthopedic Surgery        3.37    2.102680e+09
   111→12          Physical Therapist        2.04    2.001104e+09
   112→13          Nurse Practitioner        2.47    1.847692e+09
   113→14                  Nephrology        2.27    1.695260e+09
   114→15          Radiation Oncology        3.66    1.413984e+09
   115→16           Pulmonary Disease        2.13    1.345555e+09
   116→17                    Podiatry        1.64    1.304216e+09
   117→18                     Urology        2.67    1.289390e+09
   118→19            Gastroenterology        3.27    1.259962e+09
   119→
   120→================================================================================
   121→LEAD 4: GEOGRAPHIC CONCENTRATION — Per-provider payments by state over time
   122→================================================================================
   123→   state  providers_2015  per_prov_2012  per_prov_2015  pct_change
   124→0     AK            2257        42750.0        50086.0        17.2
   125→1     HI            3217        54684.0        59730.0         9.2
   126→2     VT            2739        41431.0        45134.0         8.9
   127→3     OR           11770        47850.0        50998.0         6.6
   128→4     DE            3577        95239.0       100460.0         5.5
   129→5     NV            6359       109154.0       115079.0         5.4
   130→6     MD           20294        99980.0       104065.0         4.1
   131→7     WY            1816        51530.0        53282.0         3.4
   132→8     NE            6879        66996.0        69170.0         3.2
   133→9     OK           10877        85684.0        88284.0         3.0
   134→10    NH            5838        52062.0        53464.0         2.7
   135→11    UT            6896        58899.0        59910.0         1.7
   136→12    MA           30826        63520.0        64542.0         1.6
   137→13    CO           15397        58609.0        59515.0         1.5
   138→14    AZ           18755        94345.0        95365.0         1.1
   139→15    MT            3663        49415.0        49921.0         1.0
   140→16    IA           10614        63564.0        64125.0         0.9
   141→17    ND            3162        47594.0        47888.0         0.6
   142→18    ID            4714        43772.0        43608.0        -0.4
   143→19    KS            9996        80430.0        79471.0        -1.2
   144→
   145→--- Top 15 Florida cities by total Medicare payments (2015) ---
   146→            city  providers  total_payments  per_provider
   147→0          TAMPA       3361     446490568.0      132845.0
   148→1   JACKSONVILLE       3819     401176199.0      105047.0
   149→2          MIAMI       3802     297322189.0       78202.0
   150→3     FORT MYERS       1458     288756077.0      198049.0
   151→4        ORLANDO       2723     261005032.0       95852.0
   152→5       SARASOTA       1320     249539017.0      189045.0
   153→6         NAPLES       1158     229156366.0      197890.0
   154→7          OCALA        901     183327632.0      203471.0
   155→8     BOCA RATON       1033     169168670.0      163764.0
   156→9    GAINESVILLE       1782     168116578.0       94342.0
   157→10     PENSACOLA       1242     160373189.0      129125.0
   158→11     MELBOURNE        899     133769099.0      148798.0
   159→12  DELRAY BEACH        582     121660928.0      209039.0
   160→13     BRADENTON        754     113639216.0      150715.0
   161→14    CLEARWATER        861     106925090.0      124187.0
   162→
   163→================================================================================
   164→LEAD 5: CARDIOLOGY PAYMENT DECLINE — What's driving the $757M drop?
   165→================================================================================
   166→   hcpcs_code                                                                                                                         description     pay_2012     pay_2015       change  pct_change
   167→0       78452                                                 Nuclear medicine study of vessels of heart using drugs or exercise multiple studies  470137615.0  335000357.0 -135137258.0       -28.7
   168→1       99213                                                          Established patient office or other outpatient visit, typically 15 minutes  288724601.0  223882836.0  -64841765.0       -22.5
   169→2       93306                             Ultrasound examination of heart including color-depicted blood flow rate, direction, and valve function  571965579.0  519364853.0  -52600725.0        -9.2
   170→3       99232                                                                    Subsequent hospital inpatient care, typically 25 minutes per day  288673854.0  245953051.0  -42720803.0       -14.8
   171→4       99223                                                                       Initial hospital inpatient care, typically 70 minutes per day  209381632.0  168436643.0  -40944989.0       -19.6
   172→5       A9500                                                                             Technetium tc-99m sestamibi, diagnostic, per study dose  126404823.0   89344043.0  -37060780.0       -29.3
   173→6       93458                                     Insertion of catheter in left heart for imaging of blood vessels or grafts and left lower heart  116790619.0   80390007.0  -36400612.0       -31.2
   174→7       93015  Exercise or drug-induced heart and blood vessel stress test with EKG monitoring, physician supervision, interpretation, and report   81161093.0   47452015.0  -33709078.0       -41.5
   175→8       A9502                                                                           Technetium tc-99m tetrofosmin, diagnostic, per study dose   60552976.0   36227016.0  -24325960.0       -40.2
   176→9       93000                                                             Routine EKG using at least 12 leads including interpretation and report   91340068.0   75196879.0  -16143189.0       -17.7
   177→10      99222                                                                       Initial hospital inpatient care, typically 50 minutes per day   97383281.0   82173913.0  -15209368.0       -15.6
   178→11      93880                                                Ultrasound scanning of blood flow (outside the brain) on both sides of head and neck   93045814.0   78951711.0  -14094103.0       -15.1
   179→12      J2785                                                                                                      Injection, regadenoson, 0.1 mg   89082623.0   76225730.0  -12856893.0       -14.4
   180→13      93010                                              Routine electrocardiogram (EKG) using at least 12 leads with interpretation and report   77588810.0   65761301.0  -11827509.0       -15.2
   181→14      A9505                                                                       Thallium tl-201 thallous chloride, diagnostic, per millicurie   23500574.0   12136735.0  -11363839.0       -48.4
   182→15      99211                                                           Established patient office or other outpatient visit, typically 5 minutes   24642100.0   14211489.0  -10430611.0       -42.3
   183→16      93224                                                                   Heart rhythm tracing, analysis, and interpretation of 48-hour EKG   26659534.0   16671096.0   -9988438.0       -37.5
   184→17      93459                                     Insertion of catheter in left heart for imaging of blood vessels or grafts and left lower heart   22796423.0   13187804.0   -9608619.0       -42.1
   185→18      99231                                                                    Subsequent hospital inpatient care, typically 15 minutes per day   24762513.0   15291352.0   -9471161.0       -38.2
   186→19      99215                                                          Established patient office or other outpatient, visit typically 40 minutes  105033953.0   96355211.0   -8678742.0        -8.3
   187→
   188→================================================================================
   189→LEAD 6: HIGH SERVICES-PER-BENEFICIARY — Potential overutilization (2015)
   190→(Individual providers, >$200K total payments)
   191→================================================================================
   192→           npi   last_name first_name               provider_type state               city  svcs_per_bene  total_services  total_benes  total_payment
   193→0   1669475737    FOJTASEK     MARVIN          Infectious Disease    TX            ADDISON         1781.4       5750425.0       3228.0      4703150.0
   194→1   1649501958     AKALONU   THEODORE  Ambulance Service Supplier    TX           RICHMOND          770.8         21582.0         28.0       413564.0
   195→2   1598731770   ANASTASIO    PATRICK           Internal Medicine    FL  FORT WALTON BEACH          581.6       1570902.0       2701.0      1752645.0
   196→3   1902236326     RAMIREZ       JUAN  Ambulance Service Supplier    TX            HIDALGO          548.6         24685.0         45.0       505081.0
   197→4   1679634588   PENINGTON    DOUGLAS          Nurse Practitioner    NE              OMAHA          441.9        190001.0        430.0       208263.0
   198→5   1871607226   OGUNSAKIN   MOBOLAJI           Internal Medicine    GA      WARNER ROBINS          430.2        425010.0        988.0       454217.0
   199→6   1740376714       RUBEN      BRUCE          Infectious Disease    MI    WEST BLOOMFIELD          360.7        164499.0        456.0       318077.0
   200→7   1215971668    RAYAPUDI    MADHAVI          Infectious Disease    GA            CUMMING          358.3        991851.0       2768.0      1327101.0
   201→8   1700996758    MACALUSO    VINCENT                   Neurology    NY      NEW HYDE PARK          357.1         52849.0        148.0       701955.0
   202→9   1396842704          LE      THUAN          Infectious Disease    CA           MURRIETA          354.5        325056.0        917.0       392798.0
   203→10  1881610715    ALYASEEN      SAMER          Infectious Disease    CO             AURORA          351.1        332107.0        946.0       410792.0
   204→11  1124130539  DALI AHMAD       RIAD          Infectious Disease    MI        GRAND BLANC          338.0        880566.0       2605.0      1387863.0
   205→12  1740276674       LATHI      ELLEN                   Neurology    MA           BRIGHTON          331.5         54690.0        165.0       703441.0
   206→13  1790991164  GREENSTEIN    JEFFREY                   Neurology    PA       PHILADELPHIA          330.7        106161.0        321.0      1352868.0
   207→14  1861408734      HUSAIN       SYED          Infectious Disease    MI   STERLING HEIGHTS          328.6        402500.0       1225.0       573712.0
   208→15  1700810173      BUSTOS      PETER  Ambulance Service Supplier    TX             CONROE          325.4         14643.0         45.0       275035.0
   209→16  1639255664        GILL     SHAZIA          Infectious Disease    TX             CONROE          322.9        191149.0        592.0       444241.0
   210→17  1922116268        AMIN   JASHVANT         Hematology/Oncology    NJ             ORANGE          311.6        201949.0        648.0       581314.0
   211→18  1588658074       CHANG     SOPHIA                Rheumatology    WA              PASCO          284.3         86435.0        304.0       328022.0
   212→19  1073538047       FOLEY       JOHN                   Neurology    UT     SALT LAKE CITY          284.0        187182.0        659.0      2077452.0
   213→20  1326156191     JOHNSON     HAROLD         Hematology/Oncology    UT              OGDEN          266.8        480032.0       1799.0       720063.0
   214→21  1831342096    RICHARDS       ERIC          Infectious Disease    CO             AURORA          264.0        213849.0        810.0       309396.0
   215→22  1649207242      ARNOLD      ANITA           Internal Medicine    TN            MEMPHIS          255.4        519761.0       2035.0       567127.0
   216→23  1750309811      PARKER     STEVEN          Infectious Disease    NV               RENO          246.3        172416.0        700.0       232275.0
   217→24  1396099099       DURON     CARMEN  Ambulance Service Supplier    TX        SAN ANTONIO          245.7         22600.0         92.0       546176.0
   218→
   219→================================================================================
   220→LEAD 7: NURSE PRACTITIONER BILLING SURGE — Top NP billers (2015)
   221→================================================================================
   222→           npi      last_name first_name state          city  total_payment  total_services  total_benes  num_procedures
   223→0   1538382627         MILLER     MARTHA    IN   TERRE HAUTE      2765716.0        236406.0       1544.0              27
   224→1   1992955041  BARBER-WALKER      KELLY    TX        AUSTIN      2179970.0         69196.0        606.0              11
   225→2   1467452938         MORRIS  KIMBERLEY    NV        SPARKS      1490559.0        113126.0       1065.0              23
   226→3   1972510303         BAYLES      ANITA    TN     CLEVELAND      1425876.0         64736.0      16308.0              67
   227→4   1003800616         TAYLOR    PHYLLIS    TN   CLARKSVILLE      1412556.0         62948.0       8388.0              33
   228→5   1457327793         CARTER     SHEILA    TN       DENMARK      1219981.0          2144.0        438.0               2
   229→6   1750310124         CHANEY   KIMBERLY    AL       CULLMAN      1105324.0         33750.0        852.0              16
   230→7   1366786394         KELNER   MARGARET    TX    SAN MARCOS      1076308.0        101079.0       1030.0              20
   231→8   1245640085          CLARK        AMY    TN  JOHNSON CITY      1028711.0         39507.0      10797.0              34
   232→9   1982776563           TOTH     SHARON    AZ   QUEEN CREEK      1013201.0         33394.0       1139.0               8
   233→10  1982863882       AKIOYAME   FRANKLIN    NV     LAS VEGAS      1011891.0         10504.0       2753.0               9
   234→11  1790088482           LADD  CHRISTINE    TN     MARYVILLE       974527.0         42406.0       6660.0              52
   235→12  1891951547         BUTLER       JENE    TX        MURPHY       931943.0         43264.0        616.0               9
   236→13  1518920511          JONES    MARYANN    MS       FLOWOOD       881020.0         41933.0       8872.0              50
   237→14  1194740571          MCKEE     AMANDA    IL  MOUNT VERNON       856350.0         63580.0       3148.0              25
   238→15  1831189281         YEAGER    KRISTIN    TX        AUSTIN       835701.0         79295.0        928.0              21
   239→16  1790955342      WIECZOREK     CHERYL    PA     WYNNEWOOD       813301.0         34502.0       1565.0              18
   240→17  1558503581          ERWIN      SHERI    AZ       PHOENIX       810196.0         77763.0       1509.0              35
   241→18  1487885679          EWING   JENNIFER    IN     MISHAWAKA       802242.0        118499.0       1784.0              37
   242→19  1194036327           VANN       MARK    TN       DECHERD       788820.0         34038.0       4732.0              22
   243→
   244→--- NP billing growth by state (top 15 by growth) ---
   245→   state  np_2012  np_2015  provider_pct_growth    pay_2012    pay_2015  payment_pct_growth
   246→0     NV      252      494            96.000000   5078564.0  16398653.0               222.9
   247→1     DE      165      291            76.400002   3451426.0   7610680.0               120.5
   248→2     OK      445      817            83.599998   8316060.0  17443811.0               109.8
   249→3     AL      704     1342            90.599998  15195608.0  31383478.0               106.5
   250→4     UT      345      530            53.599998   4104113.0   8138073.0                98.3
   251→5     SC      685     1179            72.099998  15310068.0  30227264.0                97.4
   252→6     AZ     1090     1701            56.099998  20265287.0  38175732.0                88.4
   253→7     NJ     1045     1667            59.500000  29985032.0  55395246.0                84.7
   254→8     MO     1411     2096            48.500000  23861925.0  43290245.0                81.4
   255→9     GA     1376     2262            64.400002  24853174.0  45024898.0                81.2
   256→10    LA      902     1449            60.599998  19546470.0  35305271.0                80.6
   257→11    MD      962     1430            48.599998  24603938.0  44336511.0                80.2
   258→12    CO      633     1040            64.300003  10336472.0  18459363.0                78.6
   259→13    IL     1644     2799            70.300003  37613694.0  67166865.0                78.6
   260→14    NC     1872     2898            54.799999  44169208.0  78746419.0                78.3
   261→
   262→================================================================================
   263→LEAD 8: OPHTHALMOLOGY DRUG SPENDING — Top providers for eye injection drugs (2015)
   264→================================================================================
   265→           npi  last_name first_name state              city hcpcs_code               hcpcs_description  services  beneficiaries  total_payment  avg_payment
   266→0   1558329847   KUNIMOTO      DEREK    AZ           PHOENIX      J2778  Injection, ranibizumab, 0.1 mg   28760.0            722      8855171.0       307.90
   267→1   1033116074      WELCH       JOHN    NE          HASTINGS      J0178    Injection, aflibercept, 1 mg    9574.0            790      7349989.0       767.70
   268→2   1124076930     KAISER    RICHARD    PA  PLYMOUTH MEETING      J2778  Injection, ranibizumab, 0.1 mg   18562.0            676      5718857.0       308.09
   269→3   1922021195      EATON  ALEXANDER    FL        FORT MYERS      J2778  Injection, ranibizumab, 0.1 mg   17046.0            678      5248888.0       307.92
   270→4   1467409383    NIELSEN      JARED    IA   WEST DES MOINES      J0178    Injection, aflibercept, 1 mg    5954.0            555      4566882.0       767.03
   271→5   1124010913     SAGGAU      DAVID    IA   WEST DES MOINES      J0178    Injection, aflibercept, 1 mg    5647.0            604      4332908.0       767.29
   272→6   1205881976      BAUDO     THOMAS    FL        VERO BEACH      J2778  Injection, ranibizumab, 0.1 mg   13050.0            343      4033269.0       309.06
   273→7   1518994631       DYER      DAVID    KS   SHAWNEE MISSION      J2778  Injection, ranibizumab, 0.1 mg   13071.0            365      4023445.0       307.81
   274→8   1588750178      BORNE    MICHAEL    MS           JACKSON      J0178    Injection, aflibercept, 1 mg    5282.0            514      4011493.0       759.46
   275→9   1881661585      MYERS       JOHN    FL         PENSACOLA      J2778  Injection, ranibizumab, 0.1 mg   12937.0            370      3996929.0       308.95
   276→10  1922056712         HO      ALLEN    PA  PLYMOUTH MEETING      J2778  Injection, ranibizumab, 0.1 mg   12560.0            472      3877260.0       308.70
   277→11  1811992563    HERSHEY   JONATHAN    WI         MILWAUKEE      J0178    Injection, aflibercept, 1 mg    5040.0            385      3858493.0       765.57
   278→12  1801892146  BREAZEALE    RICHARD    TN       CHATTANOOGA      J2778  Injection, ranibizumab, 0.1 mg   12238.0            322      3742418.0       305.80
   279→13  1972550622   WAFAPOOR    HUSSEIN    FL        FORT MYERS      J2778  Injection, ranibizumab, 0.1 mg   11919.0            474      3679402.0       308.70
   280→14  1437360351    ALLIMAN       KYLE    IA   WEST DES MOINES      J0178    Injection, aflibercept, 1 mg    4794.0            466      3664272.0       764.35
   281→15  1033101860     BARNES    CHARLES    IA      CEDAR RAPIDS      J2778  Injection, ranibizumab, 0.1 mg   11304.0            445      3493716.0       309.07
   282→16  1902860349    HALPERN     ROBERT    GA           ATLANTA      J2778  Injection, ranibizumab, 0.1 mg   11073.0            467      3409498.0       307.91
   283→17  1972587491    MANDELL      BARRY    VA    VIRGINIA BEACH      J0178    Injection, aflibercept, 1 mg    4430.0            318      3359044.0       758.25
   284→18  1780632570     VANDER      JAMES    PA  PLYMOUTH MEETING      J2778  Injection, ranibizumab, 0.1 mg   10760.0            471      3321080.0       308.65
   285→19  1700985496      AVERY     ROBERT    CA     SANTA BARBARA      J2778  Injection, ranibizumab, 0.1 mg   10598.0            335      3275260.0       309.05
   286→
   287→--- Aflibercept (J0178) and Ranibizumab (J2778) total spending by year ---
   288→   year hcpcs_code  providers  total_services  total_payments
   289→0  2012      J2778       1604       3004486.0    9.561330e+08
   290→1  2013      J0178       1853       1028450.0    7.889215e+08
   291→2  2013      J2778       1650       3280998.0    1.020988e+09
   292→3  2014      J0178       2084       1243685.0    9.499888e+08
   293→4  2014      J2778       1737       3245743.0    1.003419e+09
   294→5  2015      J0178       2360       1749677.0    1.335266e+09
   295→6  2015      J2778       1587       2798953.0    8.592801e+08
   296→
   297→================================================================================
   298→LEAD 9: NON-PARTICIPATING PROVIDERS — Higher charges to beneficiaries
   299→================================================================================
   300→  medicare_participation_indicator  providers  total_payments  total_charges  avg_markup

... (33 more lines truncated)
Thinking...
Excellent data. Now let me compile the tipsheet. Let me also run one more targeted analysis for a few remaining leads before writing up the tipsheet.
Claude

Now let me run a few more targeted analyses to fill out remaining leads.

Write
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/03_additional_leads.py
File content
"""
Additional targeted analyses for lead development.
"""
import duckdb

con = duckdb.connect()

BASE = "/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/datasets/medicare"

for year in [2012, 2013]:
    suffix = "_update" if year == 2012 else ""
    path = f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY{year}{suffix}/Medicare_Provider_Util_Payment_PUF_CY{year}.txt"
    con.execute(f"""
        CREATE OR REPLACE VIEW medicare_{year} AS
        SELECT NPI as npi, NPPES_PROVIDER_LAST_ORG_NAME as nppes_provider_last_org_name,
            NPPES_PROVIDER_FIRST_NAME as nppes_provider_first_name,
            NPPES_PROVIDER_GENDER as nppes_provider_gender, NPPES_ENTITY_CODE as nppes_entity_code,
            NPPES_PROVIDER_CITY as nppes_provider_city, NPPES_PROVIDER_ZIP as nppes_provider_zip,
            NPPES_PROVIDER_STATE as nppes_provider_state, NPPES_PROVIDER_COUNTRY as nppes_provider_country,
            PROVIDER_TYPE as provider_type, PLACE_OF_SERVICE as place_of_service,
            HCPCS_CODE as hcpcs_code, HCPCS_DESCRIPTION as hcpcs_description,
            HCPCS_DRUG_INDICATOR as hcpcs_drug_indicator,
            LINE_SRVC_CNT as line_srvc_cnt, BENE_UNIQUE_CNT as bene_unique_cnt,
            BENE_DAY_SRVC_CNT as bene_day_srvc_cnt,
            AVERAGE_MEDICARE_ALLOWED_AMT as average_Medicare_allowed_amt,
            AVERAGE_SUBMITTED_CHRG_AMT as average_submitted_chrg_amt,
            AVERAGE_MEDICARE_PAYMENT_AMT as average_Medicare_payment_amt,
            {year} as year
        FROM read_csv('{path}', delim='\t', header=true, quote='"', ignore_errors=true, null_padding=true)
        WHERE NPI != '0000000001'
    """)

for year in [2014, 2015]:
    path = f"{BASE}/Medicare_Provider_Util_Payment_PUF_CY{year}/Medicare_Provider_Util_Payment_PUF_CY{year}.txt"
    con.execute(f"""
        CREATE OR REPLACE VIEW medicare_{year} AS
        SELECT npi, nppes_provider_last_org_name, nppes_provider_first_name,
               nppes_provider_gender, nppes...
Output
File created successfully at: /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/03_additional_leads.py
Bash
uv run --with duckdb --with pandas outputs/medicare/03_additional_leads.py 2>&1
Output
================================================================================
PNEUMOCOCCAL VACCINE EXPLOSION (90670): $3.4M -> $800M
================================================================================
   year  providers  total_services  total_beneficiaries  total_payments  avg_payment_per_svc
0  2012        880         25670.0              25532.0       3424636.0               132.50
1  2013       2626         90698.0              90029.0      12603625.0               138.30
2  2014      13970        509434.0             508075.0      75688881.0               148.35
3  2015      77468       4867462.0            4855998.0     800276942.0               164.37

--- Top provider types billing 90670 in 2015 ---
                     provider_type  providers  total_services  total_payments
0                Internal Medicine      19135       1748398.0     286735514.0
1                  Family Practice      24102       1535624.0     251494886.0
2  Mass Immunization Roster Biller      14936        607987.0     100244072.0
3                  Centralized Flu       9104        565977.0      94728678.0
4               Nurse Practitioner       4102        124896.0      20575281.0
5                Pulmonary Disease       1453         70208.0      11420847.0
6              Physician Assistant       1634         50451.0       8294571.0
7               Geriatric Medicine        259         20246.0       3329333.0
8                 General Practice        349         20279.0       3326205.0
9              All Other Suppliers        337         18834.0       3145369.0

================================================================================
DENOSUMAB (J0897) GROWTH: $177M -> $356M
================================================================================
   year  providers  total_services  total_payments  avg_payment
0  2012       3770      15579115.0     177050958.0        11.32
1  2013       5095      20721527.0     228452443.0        10.95
2  2014       6295      25917699.0     288023618.0        11.03
3  2015       7293      30823205.0     356262781.0        11.45

================================================================================
DEEP DIVE: LAGANKE (top individual biller 2015, $13.3M, Neurology, AL)
================================================================================
   year  total_payment  total_services  total_benes  num_procedures
0  2012      3924583.0        389968.0       2110.0              27
1  2013      4745074.0        452748.0       2057.0              27
2  2014      6473224.0        529576.0       2247.0              26
3  2015     13305836.0        510834.0       3254.0              33

--- Top procedures for Laganke (2015) ---
   hcpcs_code                                                                                                         hcpcs_description hcpcs_drug_indicator  services  beneficiaries  total_payment  avg_payment
0       J2323                                                                                              Injection, natalizumab, 1 mg                    Y  419700.0            213      5172580.0        12.32
1       J3590                                                                                                    Unclassified biologics                    N     282.0             61      4561729.0     16176.35
2       Q9979                                                                                              Injection, alemtuzumab, 1 mg                    Y    1404.0             35      1870522.0      1332.28
3       J1561                                   Injection, immune globulin, (gamunex-c/gammaked), non-lyophilized (e.g. liquid), 500 mg                    Y   35630.0             45      1113732.0        31.26
4       96413                                                                         Infusion of chemotherapy into a vein up to 1 hour                    N    1845.0            273       172875.0        93.70
5       J0585                                                                                     Injection, onabotulinumtoxina, 1 unit                    Y   31620.0             59       119472.0         3.78
6       99214                                                Established patient office or other outpatient, visit typically 25 minutes                    N    1574.0            795       107043.0        68.01
7       J0586                                                                                    Injection, abobotulinumtoxina, 5 units                    Y    4660.0             13        28338.0         6.08
8       96415                                                                                      Infusion of chemotherapy into a vein                    N    1201.0             96        24056.0        20.03
9       95816                                                  Measurement and recording of brain wave (EEG) activity, awake and drowsy                    N      92.0             89        22278.0       242.15
10      96365                                                   Infusion into a vein for therapy, prevention, or diagnosis up to 1 hour                    N     435.0             62        21284.0        48.93
11      96366                                                                Infusion into a vein for therapy, prevention, or diagnosis                    N    1113.0             49        15164.0        13.62
12      J0588                                                                                   Injection, incobotulinumtoxin a, 1 unit                    Y    4705.0             12        15088.0         3.21
13      64616  Injection of chemical for destruction of nerve muscles on one side of neck excluding voice box accessed through the skin                    N     115.0             45        13204.0       114.82
14      96367                      Infusion into a vein for therapy prevention or diagnosis additional sequential infusion up to 1 hour                    N     443.0             98         9550.0        21.56

================================================================================
DEEP DIVE: FOJTASEK (highest svcs/bene, 1781, Infectious Disease, TX)
================================================================================
   year  total_payment  total_services  total_benes  svcs_per_bene
0  2012        72979.0          1012.0        373.0            2.7
1  2013        32430.0           416.0         97.0            4.3
2  2014      2641238.0       3377317.0       1700.0         1986.7
3  2015      4703150.0       5750425.0       3228.0         1781.4

--- Top procedures for Fojtasek (2015) ---
   hcpcs_code                                                                                                        hcpcs_description hcpcs_drug_indicator   services  beneficiaries  total_payment  avg_payment
0       J0878                                                                                              Injection, daptomycin, 1 mg                    Y  5313837.0            237      3040226.0         0.57
1       J1335                                                                                      Injection, ertapenem sodium, 500 mg                    Y    10169.0            202       305161.0        30.01
2       J1561                                  Injection, immune globulin, (gamunex-c/gammaked), non-lyophilized (e.g. liquid), 500 mg                    Y     6368.0             14       199929.0        31.40
3       J0712                                                                                    Injection, ceftaroline fosamil, 10 mg                    Y    90360.0             21       140205.0         1.55
4       J1267                                                                                              Injection, doripenem, 10 mg                    Y   182132.0             41        99515.0         0.55
5       36571                   Insertion of central venous catheter for infusion with port beneath the skin, patient 5 years or older                    N       94.0             88        95541.0      1016.40
6       99232                                                         Subsequent hospital inpatient care, typically 25 minutes per day                    N     1561.0            199        85411.0        54.72
7       99310                                                          Subsequent nursing facility visit, typically 35 minutes per day                    N      717.0             77        75010.0       104.62
8       99495  Transitional care management services, moderately complexity, requiring face-to-face visits within 14 days of discharge                    N      508.0            355        64661.0       127.28
9       99309                                                          Subsequent nursing facility visit, typically 25 minutes per day                    N      807.0             86        57033.0        70.67
10      J3370                                                                                        Injection, vancomycin hcl, 500 mg                    Y    21521.0            199        51150.0         2.38
11      96369                                                              Infusion into tissue for therapy or prevention up to 1 hour                    N      340.0            121        50976.0       149.93
12      J0692                                                                                Injection, cefepime hydrochloride, 500 mg                    Y    21090.0            112        40762.0         1.93
13      J2185                                                                                             Injection, meropenem, 100 mg                    Y    47042.0             66        38035.0         0.81
14      J2020                                                                                              Injection, linezolid, 200mg                    Y     1176.0             14        34786.0        29.58

================================================================================
WEST DES MOINES OPHTHALMOLOGY CLUSTER
================================================================================
           npi      last_name   first_name  total_payment  total_services  total_benes
0   1124010913         SAGGAU        DAVID      7751346.0         31574.0       7468.0
1   1437360351        ALLIMAN         KYLE      7460843.0         29900.0       6986.0
2   1467409383        NIELSEN        JARED      7378290.0         27832.0       6738.0
3   1780652800        HAUPERT  CHRISTOPHER      2019632.0          8231.0       3093.0
4   1689642290          BLODI  CHRISTOPHER      1502081.0          7981.0       3215.0
5   1225098759       HEILSKOV         TODD      1461812.0          8000.0       3379.0
6   1942292776        JOHNSON       STEVEN       777738.0          6634.0       4100.0
7   1922216621          RAUEN      MATTHEW       614252.0          5576.0       3741.0
8   1225246838  STEFFENSMEIER       ANDREW       400822.0          4162.0       3562.0
9   1114906518         CASADY      DOUGLAS       261212.0          5473.0       2155.0
10  1427018423          BROWN    ELIZABETH       188005.0          2317.0       2033.0
11  1437118064        KOUNKEL      VALERIE       116727.0          1675.0       1423.0
12  1821057456          YOUNG   JEFFRIANNE        67661.0          1086.0        817.0
13  1275593279          DWYER        DAVID        64558.0           776.0        729.0
14  1861644817       AMSPAUGH         BETH        38303.0           668.0        568.0
15  1922325448      PARLITSIS       GEORGE        25021.0           261.0        230.0

================================================================================
PLYMOUTH MEETING, PA OPHTHALMOLOGY CLUSTER
================================================================================
           npi   last_name first_name  total_payment  total_services  total_benes
0   1124076930      KAISER    RICHARD      7843836.0         34561.0       6771.0
1   1922056712          HO      ALLEN      6901381.0         29574.0       7085.0
2   1457325086       SPIRN       MARC      5952156.0         24678.0       5821.0
3   1780632570      VANDER      JAMES      5941114.0         26611.0       6798.0
4   1740248038     REGILLO       CARL      5650683.0         27426.0       7142.0
5   1013965276     FINEMAN   MITCHELL      5404598.0         23355.0       6896.0
6   1942274535        PARK       CARL      4919921.0         23106.0       5739.0
7   1427056134        GARG      SUNIR      4761417.0         22693.0       6567.0
8   1700850179  SIVALINGAM     ARUNAN      4313814.0         21186.0       5986.0
9   1326268475       GUPTA      OMESH      4106135.0         20305.0       5469.0
10  1548218787         HSU      JASON      3968072.0         17062.0       6479.0
11  1275538902     MAGUIRE     JOSEPH      3778173.0         19813.0       5306.0
12  1326212960      CHIANG      ALLEN      3195404.0         15076.0       4830.0
13  1902004401       MEHTA      SONIA      2095083.0         10852.0       3833.0
14  1356309546       BROWN       GARY      1366804.0          6865.0       2456.0
15  1548267834      BAILEY     ROBERT       681697.0          4519.0       3881.0
16  1205809951       SARIN        LOV       559043.0          3488.0       1066.0
17  1437156726       WEBER        AMY       223973.0          2384.0       2062.0
18  1932360641      WISNER    DOUGLAS       196279.0          1573.0       1332.0
19  1184659310     NISSMAN     STEVEN       144619.0          1137.0       1013.0
20  1699778779       THORP          T        63402.0           881.0        751.0
21  1518954700        WULC      ALLAN        52471.0          3098.0        305.0
22  1063470359      TASMAN    WILLIAM        28931.0           510.0        369.0
23  1932140209        DUNN      JAMES        23712.0           334.0        203.0
24  1346558392      KLUFAS    MICHAEL         4960.0            66.0         58.0

================================================================================
CHELATION THERAPY (J0600, EDTA) — $1.3M to $4.1M growth
================================================================================
   year  providers  total_services  total_payments
0  2012         15         13166.0       1287127.0
1  2013         20         12971.0       1386888.0
2  2014         21         12772.0       1994058.0
3  2015         21         12515.0       4085722.0

--- Top chelation providers (2015) ---
           npi   last_name first_name          provider_type state           city  services  beneficiaries  total_payment
0   1487610697       RIBER       EZRA        Pain Management    SC       COLUMBIA     396.0            180      1004168.0
1   1871722892  STANISLAUS   JENNIFER        Pain Management    NC  WINSTON SALEM     332.0            146       743998.0
2   1689864852      AROCHO   MILAGROS  Obstetrics/Gynecology    WA           YELM    1665.0             37       736424.0
3   1811936891     BURKICH     ROBERT        Family Practice    GA       RINGGOLD    2310.0             83       406853.0
4   1306848809  ELLITHORPE       RITA       General Practice    CA         TUSTIN    1385.0            130       389200.0
5   1952395048    SPANGLER       GARY        Family Practice    TX      DICKINSON    1719.0             69       350067.0
6   1881698579     MERRITT    DOROTHY      Internal Medicine    TX     TEXAS CITY     329.0             12       148677.0
7   1902824378       KLINE    RICHARD         Anesthesiology    SC         CONWAY      56.0             45        65856.0
8   1386653913        BORN       ERIC        Family Practice    MI      PARCHMENT     249.0             15        47139.0
9   1437192465       ADAMS    CHARLES      Internal Medicine    GA       RINGGOLD     559.0             49        43239.0
10  1457542094     RALEIGH       MARY        Family Practice    CA         IRVINE     127.0             12        32360.0
11  1629002548    SOKOLOVA     YELENA           Rheumatology    NY       BROOKLYN     189.0             15        29843.0
12  1356317523        BERK      SCOTT        Family Practice    NJ        HAMPTON     575.0             35        26389.0
13  1447296694   HENDERSON       BRYN        Family Practice    CA         ORANGE    1190.0             91        24090.0
14  1982717237      PODVAL     MARINA      Internal Medicine    NY       FLUSHING      63.0             12        12348.0

================================================================================
PLACE OF SERVICE SHIFT: Facility (F) vs Office (O) over time
================================================================================
   year place_of_service  providers  total_services  total_payments
0  2012                F     497422    5.286530e+08    2.972935e+10
1  2012                O     618153    1.665415e+09    4.766470e+10
2  2013                F     509113    5.332172e+08    2.940986e+10
3  2013                O     633052    1.656036e+09    4.738924e+10
4  2014                F     519326    5.378801e+08    3.006559e+10
5  2014                O     648284    1.732697e+09    4.815518e+10
6  2015                F     531121    5.419843e+08    3.054675e+10
7  2015                O     667532    1.817310e+09    5.007696e+10

================================================================================
CHARGE-TO-PAYMENT GAP — Growing provider frustration?
================================================================================
   year  total_charges_billions  total_allowed_billions  total_payments_billions  unpaid_gap_billions  charge_to_payment_ratio
0  2012                  252.40                   99.32                    77.39               175.01                     3.26
1  2013                  262.48                  100.22                    76.80               185.68                     3.42
2  2014                  271.75                  102.03                    78.22               193.53                     3.47
3  2015                  286.41                  104.82                    80.62               205.79                     3.55
Claude

Now I have all the data I need. Let me compile the tipsheet.

Write
/Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/tipsheet.md
File content
# Tipsheet: Medicare Provider Billing Patterns, 2012-2015

## Source Material
- **Data**: CMS Medicare Provider Utilization and Payment Public Use Files, CY2012-CY2015
- **Scope**: 37.3 million provider-procedure rows across 4 years; 880K-968K unique providers per year
- **Structure**: Each row is one provider-procedure combination with service counts, charges, and Medicare payments
- **Analysis date**: March 2026
- **Coverage notes**: All four years fully analyzed using DuckDB. 2012-2013 files have slightly different schemas (include STDEV columns, no standardized amount). PDF files in each directory are CMS/AMA CPT license agreements, not analytically relevant. All analyses use the formula `line_srvc_cnt * average_Medicare_payment_amt` to estimate total payments — this is an approximation since the data provides averages, not exact totals.
- **Analysis scripts**: `outputs/medicare/01_profile_and_macro.py`, `02_outliers_and_leads.py`, `03_additional_leads.py`

## Summary of Findings

Medicare Part B spending grew from $77.4B to $80.6B (4.2%) between 2012 and 2015, but the growth was wildly uneven. Drug spending surged 37% ($5.4B to $7.4B), driven by a handful of expensive injectable medications, while cardiology payments collapsed 15% ($757M decline). The gap between what doctors charge and what Medicare pays widened from $175B to $206B — providers are now charging 3.55x what they're paid, up from 3.26x. Several individual providers billed Medicare over $8 million per year, often dominated by a single expensive drug. Geographic clusters — particularly in ophthalmology — suggest practice patterns worth investigating.

## Leads

### Lead 1: Pneumococcal Vaccine Spending Exploded 23,000% in Three Years
**Signal strength**: Strong

**Evidence**: HCPCS code 90670 (pneumococcal conjugate vaccine, Prevnar 13) went from $3.4M in total Medicare payments in 2012 to $800M in 2015 — a 23,268% increase. The number of providers billing this code went from 880 to 77,468. Services...
Output
File created successfully at: /Users/nrh146/Documents/agentic-investigation-newsworthiness-dataonly/outputs/medicare/tipsheet.md
Claude

The tipsheet is saved at outputs/medicare/tipsheet.md, with all analysis scripts alongside it. Here's the summary:

Top Leads from 37M Medicare Records (2012-2015)

Lead 1 (Strong): Pneumococcal vaccine spending exploded 23,000% — from $3.4M to $800M in three years after a new ACIP recommendation. Is Medicare overpaying per dose compared to the VA or international buyers?

Lead 2 (Strong): The $206B charge gap is widening — providers now charge 3.55x what Medicare pays, up from 3.26x. This hits non-Medicare patients who get billed at these inflated rates.

Lead 3 (Strong): Injectable drug spending surged 37% while overall Medicare grew just 4.2%. Ophthalmology drugs (Eylea at ~$767/injection, Lucentis at ~$308) dominate — despite clinical trials showing $50/injection Avastin works equally well.

Lead 4 (Strong): Ophthalmology billing clusters — three eye doctors in West Des Moines, IA each billed $7-8M, and 10 in Plymouth Meeting, PA billed $3-8M each, mostly for these expensive eye injections.

Lead 5 (Moderate): Cardiology payments collapsed 15% ($757M decline) — nuclear imaging and stress tests fell 30-48%, likely reflecting "appropriate use" pushback against overtesting.

Lead 6 (Moderate): Extreme overutilization signals — one infectious disease doctor billed 1,781 services per patient (5.75M total services), jumping from $73K to $4.7M in two years.

Lead 7 (Moderate): Nurse practitioner billing surged 66% — the fastest-growing provider type, with dramatic state-by-state variation likely tied to scope-of-practice laws.

Lead 8 (Preliminary): Chelation therapy costs tripled to $4.1M despite flat utilization — only 21 providers bill for this controversial treatment.

Lead 9 (Moderate): A neurologist in rural Alabama (pop. 15,000) was the #1 individual Medicare biller at $13.3M, tripling in 3 years, with $4.6M billed under an opaque "unclassified biologics" code.

The strongest story angle may be Lead 3+4 combined: ophthalmologists choosing $300-770/injection drugs over a clinically equivalent $50 alternative, enabled by Medicare's ASP+6% reimbursement that rewards more expensive drugs — costing taxpayers over $2 billion/year for eye drugs alone.