Every Dollar Counts, Every Vote Counts

I. The Vote

One fifth of how Congress votes tracks with who paid for their campaign. That's the number. I built a model, fed it 36 million campaign contributions and 2.4 million congressional votes, and asked it to decompose every vote into its explanatory parts: ideology, party loyalty, bill content, district demographics, committee assignments, and money.¹ The money part came back at 21.3%. Which is, before you do anything with that number, genuinely complicated, because the other four fifths is exactly what you'd want it to be. A Republican from rural Alabama votes like a Republican from rural Alabama. A Democrat from coastal California votes like a Democrat from coastal California. Party loyalty and ideology explain the bulk of it. Congress is not a vending machine. The model, for the most part, is bored.

But every so often a vote comes back weird.

Representative Ritchie Torres, a Democrat from the South Bronx who represents what is statistically the poorest congressional district in the country, voted for multiple crypto regulation bills in the 119th Congress, including the GENIUS Act and the Digital Asset Market Structure bill. Torres has an articulated theory for why this makes sense for his district: he argues that decentralized finance and dollar-denominated stablecoins are tools for financial inclusion in underbanked communities, that the project of building a cheaper, faster payment system is fundamentally progressive, and that his constituents stand to benefit from exactly the kind of financial infrastructure these bills would create. He also co-founded the Congressional Crypto Caucus and received over $129,000 in crypto industry contributions and another $173,000 in crypto-aligned super PAC support during the 2024 cycle. I ran those votes through the model. It said: 39.1% financial. 40.8% bill content. Ideology barely registers. Party loyalty barely registers. The model thinks money and the substance of the legislation together explain roughly 80% of this prediction, and the factors that normally dominate congressional voting (the ones that make the model bored) explain almost nothing.

Now. I want to be careful here, because (and I'm going to say this several times in this piece, enough times that you'll get tired of hearing it, which is fine because the alternative is that I say it once and you forget and then send me an email) correlation is not causation. The model cannot tell you that Torres voted for crypto legislation because of campaign donations from the digital asset industry any more than it can tell you that the industry gave him money because they knew he was already inclined to support their bills. Both explanations produce identical data signatures. The model sees the correlation, flags it, quantifies it to two decimal places, and then moves on with the supreme indifference of an algorithm that does not understand what money is or why people want it. That is a limitation of gradient-boosted trees and also, as it turns out, of the entire field of campaign finance research going back roughly to the invention of campaigns.²

Torres is one vote. The model has 350,000 of them. Here are a few where the financial signal was loudest:

Highest Financial Signal on Party Defections

Votes where a member broke from their party and the model's money attribution was in the top decile. Dashed line shows the 21.3% average across all votes.

Victoria Spartz(R-IN)

Defense Appropriations

45.3%

Sheldon Whitehouse(D-RI)

Presidential Nomination

41.9%

Richard Blumenthal(D-CT)

CRA Disapproval

41.2%

John Thune(R-SD)

Eliminate Shutdowns Act

41.1%

Ritchie Torres(D-NY)

GENIUS Act (crypto)

39.1%

21.3% avg (all votes)Money attribution (% of prediction)

II. The Number

Here is the headline finding: across all votes cast by all 539 current members of Congress, the model's mean money attribution is 21.3%. Financial features (donor industry concentration, PAC ratios, in-state vs. out-of-state money, dark money exposure, and about sixty other variables that I will spare you the enumeration of) account for roughly a fifth of the model's prediction of how a member of Congress will vote on any given bill.

I want to sit with that for a second, because I think the instinct (my instinct, anyway, and maybe yours) is to either round it up to "money controls everything" or round it down to "that's not that much." Both of those are wrong, or at least wrong-ish, in ways I think matter.

On the "round it up" side: nearly 80% of the variance is explained by things that have nothing to do with money. Ideology. Party loyalty. What the bill actually says. Whether the member is on a relevant committee. What their district looks like. Congress is not a vending machine where you insert a campaign contribution and a vote comes out. Most votes, the vast majority of votes, go the way they would go if campaign finance did not exist at all. The model is quite clear about this.

On the "round it down" side: 21.3% is not zero. And the 21.3% is not distributed evenly. It clusters. It clusters around certain types of members, certain types of votes, and certain moments in a legislator's career. The clusters are where the stories are.

III. The Tenure Curve

This is, I think, the most interesting thing in the data, and I'll confess that I didn't expect to find it.

When you break money attribution down by how long a member has served, a pattern emerges that looks less like a straight line and more like a hill.³ The cynical interpretation is that freshmen arrive idealistic, then gradually learn to follow the money, and by their third or fourth term they're fully captured. But if that were true, the curve would keep going up. It doesn't. It peaks mid-career and comes back down. The longest-serving members of Congress are less financially predictable than mid-career ones.

The Tenure Curve

Average money attribution by term number. Both chambers peak mid-career, then decline.

House

Senate

HouseSenate21.3% overall avg

Which raises a question the model can't answer but that I think the shape of the curve strongly implies: what if the hill isn't about corruption at all, but about leverage?

Think about it from the donor's perspective (which is, I realize, a slightly unsettling invitation, like being asked to think about things from the mosquito's perspective, but bear with me). A first-term member is a bad investment. They don't have committee assignments yet, they don't know the procedural levers, they might lose their next election, and they're still figuring out which hallway the bathroom is in. You wouldn't buy stock in a company that hasn't shipped a product. By the third or fourth term, though, the member is established, reelectable, probably on a committee that matters, and (this is the key part) still hungry. They need money to consolidate. They need money to fend off primaries. They need money because the cost of a House race has roughly tripled since 2000, and nobody gets to stop fundraising, ever, which is a feature of the system that nobody designed but that everybody perpetuates in the way that a traffic jam is caused by nobody in particular.

But then something happens around the fifth or sixth term. The member becomes an institution. They chair committees. They have leadership positions, name recognition, and donor networks that fundraise themselves. And here's where it gets interesting: a committee chair doesn't need a particular donor's money the way a third-term backbencher does, but a particular donor very much needs the committee chair's vote. The power asymmetry flips. The money still flows, but it flows uphill now, toward power rather than creating it, and the model (which can detect correlation but not direction) sees this as a decrease in financial signal because the money is no longer predictive of anything the member wouldn't have done anyway.

Or to put it less charitably: maybe the longest-serving members aren't less influenced by money. Maybe they're just better at making sure the money never has to ask.

And the hill, it turns out, has two different slopes depending on which side of the aisle you're standing on.

The Hill Has Two Slopes

Money attribution by term and party. The first reelection cycle changes both parties, but not equally.

DemocratsRepublicans21.3% overall avg

Both parties start in roughly the same place as freshmen. Then the first reelection happens, and they diverge. Democrats jump and flatten. Republicans jump nearly twice as far and stay elevated. The gap between the two curves persists at every tenure level after that, regardless of seniority, and I don't have a clean explanation for why. The donor ecosystems of the two parties are structurally different (Republican candidates rely more heavily on corporate PACs and industry money; Democratic candidates have shifted toward small-dollar fundraising), and the model may be picking up that structural asymmetry rather than a behavioral one. Or Republicans may genuinely be more responsive to financial signal. The model, as usual, declines to adjudicate.

And then there are the freshmen who arrive already captured. The lowest-yield first-termers in the 119th Congress (Walkinshaw (D-VA) at 12.6%, Eric Schmitt (R-MO) and Husted (R-OH) at 14.7%) look like what you'd expect: new members whose donors haven't had time to build predictive patterns. But the highest-yield freshmen (Derek Schmidt (R-KS) at 24.3%, Moore (R-WV) at 23.9%) show financial signal from day one that exceeds the average for members who've served a decade. They didn't learn to follow the money. They arrived pre-aligned with it. Which suggests that for at least some members, the tenure curve is less about corruption over time and more about money selecting for compliant candidates before the first vote is ever cast.⁴

IV. Where Money Matters Most

The 21.3% average conceals some pretty significant variation depending on what kind of vote you're looking at.

Presidential nominations, for instance, show the highest financial signal at 23.4%. This makes a certain intuitive sense: judicial and executive nominations are policy decisions with long time horizons and enormous downstream consequences for specific industries, and the donors who care about them tend to care intensely. When a member of the Senate Finance Committee votes on a Treasury nominee who will oversee banking regulation, the industries affected by that regulation have a very concrete interest in the outcome, and that interest shows up in the model.

The more subtle finding is about contestedness. You might assume that the closest votes (margins under 10%) would show the highest money attribution, on the theory that money matters most when the outcome is in doubt. The data says otherwise. Moderate-margin votes (10-30% margin) show the highest financial signal at 22.4%, compared to 21.3% for tight votes. My reading of this (and I want to flag that this is interpretation, not finding) is that the tightest votes are the ones where party leadership is applying maximum pressure and every other factor is cranked to maximum, which drowns out the financial signal. The moderate-margin votes are the ones where the outcome is mostly settled but there's room at the edges for individual members to go their own way, and that's where the money shows up. Not in the theatrical cliffhangers. In the votes nobody's watching.⁵

And then there's the committee effect. Members of the Finance Committee show 22.5% money attribution; members not on that committee show 21.2%. Health committee members: 22.1% vs 21.2%. These are modest gaps (a point or so) but they're consistently positive across every major committee, and they go in the direction you'd predict: the committees that oversee the industries spending the most money are the ones where the financial signal is strongest. The one interesting exception is the Energy committee, which shows essentially no gap at all (21.4% vs 21.3%), which is either a data artifact or an indication that energy money operates differently from financial and healthcare money, a question I do not have the data to answer and which someone with more domain expertise than me should probably investigate.

V. The Oldest Trick

Before I go any further into the data, I want to back up and acknowledge something that I think the data, on its own, cannot communicate, which is that none of this is new. The specific numbers are new. The model is new. The ability to decompose individual votes into attributive factors is (I think) genuinely novel. But the pattern (private money flowing toward political power, political power flowing toward favorable treatment of the money, and everybody involved maintaining a vocabulary designed to describe this as something other than what it obviously is) is very, very old.

The Roman Republic had a word for electoral bribery: ambitus. They also had a word for generosity toward voters: benignitas. Cicero himself drew the distinction. In practice, nobody could tell them apart, which was more or less the point. Roman elections ran on free food, entertainment, and hard cash; candidates walked the city with a nomenclator (essentially a human Rolodex) whispering voter names so they could greet everyone personally; and the Senate kept passing increasingly severe anti-bribery laws that somehow never worked. The Lex Baebia of 181 BC. The Lex Acilia Calpurnia of 67 BC. Cicero's own Lex Tullia of 63 BC. Each one raised the penalties. Each one was circumvented within a year. Eventually the anti-corruption laws themselves became weapons in the power struggles they were designed to prevent: Pompey shortened bribery trials to three hours and then used the expedited process to prosecute his political enemies. When his own father-in-law was accused under the same law, Plutarch relates that Pompey put on mourning clothes and summoned the jurors to his house. The charges were dropped.⁶

The American version of this pattern starts in 1896, when a Cleveland industrialist named Mark Hanna systematically assessed banks and corporations a percentage of their assets to fund William McKinley's presidential campaign against William Jennings Bryan. He didn't ask for donations. He assessed them. Like taxes, except the taxes went to electing a president who would be friendly to the people paying them. Hanna raised the equivalent of roughly $150 to $200 million in today's dollars. When asked about money and politics, he reportedly said there were two things that mattered: "The first is money, and I can't remember what the second one is."⁷ Before Hanna, corporate involvement in elections was ad hoc. After Hanna, it was systematic, professionalized, and scaled. Every subsequent era of campaign finance is essentially a variation on the infrastructure he built.

The mechanics have changed since 1896. The Supreme Court has opened and closed various loopholes (Buckley⁸, McCain-Feingold, Citizens United, SpeechNow). But the total outside spending in federal elections tells the structural story more clearly than any case law:

The Ratchet

Outside spending in federal elections, 1990–2020.

ReformCourt decisionThe Ratchet

Source: OpenSecrets

What the chart shows, and what I think the Romans would recognize instantly, is the ratchet. Each scandal produces a reform. Each reform creates a workaround. Each workaround becomes the new normal. The baseline only goes in one direction.

VI. What I Don't Know

I want to be honest about what this data can and can't tell you, because I think there's a version of this post (a version I was tempted to write, if I'm being transparent about my own rhetorical impulses) that presents the 21.3% number as a definitive indictment of the campaign finance system, cherry-picks the most dramatic individual defections, and lets the reader walk away with their priors confirmed. That version would get more engagement. It would also be misleading.

Here is what the model actually tells you: financial features are statistically associated with vote predictions at a rate that a gradient-boosted tree finds meaningful, after controlling for everything else I could think of to control for. It says this association is stronger for mid-career members than for freshmen or senior members. It says the association is stronger in the Senate than in the House (24.4% vs 21.6%). It says the association is stronger on nominations and moderate-margin votes than on nail-biters. It says that Republicans show higher financial signal than Democrats across every single policy domain, not just the ones you'd expect (the gap is widest on procedural votes, at 2.3 points, which is genuinely puzzling and which I do not have a good explanation for). And it says that members are 2.4 times more likely to defect from their party on bills that were actively lobbied than on bills that weren't.⁹

There is also, I should note, a hole in the methodology that I haven't figured out how to close. The model measures how members vote. It does not measure whether members show up. When I broke attendance down by vote type, a pattern emerged: House members skip contested votes (the close ones, the ones where the lobbying pressure is highest) at rates 4 to 5 percentage points higher than they skip easy votes. Senators show the opposite pattern; they show up more for contested votes than routine ones. What this means is that some House members with low financial signal in the model may not be clean so much as absent. They weren't in the room when the pressure was on, so the model never saw them tested. A legislator who votes 95% of the time and still shows low financial influence has demonstrated something. A legislator who votes 60% of the time and shows low financial influence might just be hiding.¹⁰

What the model does not tell you is why any of this is true. The causal mechanism remains, and I think will always remain, genuinely ambiguous. Does money change votes, or does money follow ideology? Is the tenure curve evidence of capture, or evidence of donor selection? When Ritchie Torres votes with Republicans on crypto legislation and the model says 39% of the prediction is financial, does that mean the donations caused the vote, or that Torres was always inclined toward crypto-friendly policy and the donations are just a correlate of that inclination? The model shrugs. So do I.

But here is the thing that kept me coming back to this project, the thing that I think makes the 21.3% number worth knowing even in the absence of causal certainty: the data is all public. Every dollar, every vote, every ideology score. It's all public. And yet functionally, in terms of any normal person's ability to sit down and connect the money to the votes, it might as well be locked in a vault. The contributions live in one database, the votes in another, the industry classifications in a third, the lobbying filings in a fourth. The plumbing is the hard part. I did the plumbing.¹¹ ¹²

Whether 21.3% is too much is a question I don't think a model can answer. It's a question about what kind of system you want to live in, and what level of financial influence over democratic decision-making you're willing to accept, and whether a structural pattern that Cicero would recognize and that Mark Hanna professionalized and that Citizens United turbocharged is something you consider a problem or a feature. The model just gives you the number. The interpretation is yours.

The project is called the Congressional Yield Index, which is a pun on "yield" as in bond yield (the return you get on an investment) and "yield" as in what a legislator does when they yield to pressure. I'm unreasonably pleased with this name. For the technically curious, here's the model card as of this writing:


Model	XGBoost, two-stage
Features	239 (62 financial, 50 bill, 49 knowledge base, rest other)
Training data	2.3M votes, 36M contributions, 460K lobbying filings
Accuracy	90.8% (vs. 95.2% baseline)
Defection AUC	0.765
Money attribution	21.3% mean, via SHAP

"Money attribution" means the share of the model's prediction for a given vote that is driven by financial features (donor industry concentration, PAC ratios, lobbying pressure, employer HHI, dark money exposure, etc.) rather than ideology, party, bill content, district, or institutional position. ↩

The technical term for this ambiguity in the political science literature is the "selection vs. influence" problem, and it has been the subject of approximately ten thousand published papers, none of which have resolved it. The best summary I've seen is from a researcher who said the field's consensus position is "probably both, in proportions that vary by member and by vote, in ways we cannot measure." Which is, if you think about it, a very expensive way of saying "we don't know." ↩
The Senate numbers are noisier because you're working with much smaller sample sizes (14 first-term senators vs. 64 first-term House members), so I'd weight the House curve more heavily. The shape, though, is the same in both chambers, which I find suggestive if not conclusive. ↩
There is a running theme in this piece where the model identifies a pattern and then declines to explain it. This is, I've come to believe, the fundamental experience of working with machine learning on social science questions: you get better and better at describing what is happening, and no better at all at understanding why. ↩
The highest individual money attribution in the entire dataset is 45.3%, belonging to Representative Victoria Spartz of Indiana on a defense appropriations vote. 45.3% is remarkably high; it means the model thinks nearly half the prediction of her vote comes from financial features alone. Spartz defected from her party on the vote. I mention this not to single her out but because it illustrates the range: the average is 21.3%, but the tails extend much further than you'd expect. ↩
The vocabulary changes, the structure doesn't. The Romans had benignitas. We have "constituent service." The conceptual distinction between influence and corruption is maintained, in every era, by the people who benefit most from its fuzziness. ↩
There's some scholarly debate about whether Hanna actually said this verbatim or whether it's been polished by repetition into a form punchier than the original. This is true of most good political quotes, which exist in a kind of superposition between "historically documented" and "too good to fact-check" until someone with a PhD forces the waveform to collapse. ↩
Buckley v. Valeo (1976) is the load-bearing wall of the whole structure. The Court ruled that spending money on political campaigns is constitutionally protected speech, but drew a line: you can limit what someone gives directly to a candidate (contributions), because those create a risk of quid pro quo corruption, but you cannot limit what someone spends independently about a candidate (expenditures), because that's speech and the First Amendment has feelings about speech. This distinction (contributions: regulable; expenditures: protected) has governed every subsequent case and is the reason outside spending can grow without limit while direct donations remain capped at figures that a serious donor would consider a rounding error. I'm simplifying, and I know I'm simplifying. Buckley is a genuinely complicated decision with defensible reasoning on multiple sides, and the fact that it has produced a system where you can't give a congressman more than $3,300 but you can spend $40 million on ads supporting him as long as you don't "coordinate" (a word doing more load-bearing work than any word in American law should have to) is something the Court probably did not fully anticipate. Or maybe they did. The opinion is 294 pages long and I have read a portion of it, which I mention not to impress you but to explain why I spent a week in a mood. ↩
The lobbied-bill finding comes with a significant asterisk: bills that attract lobbying attention are, by definition, more consequential and more contentious than bills that don't, which means the higher defection rate could be driven entirely by the bills' inherent contentiousness rather than the lobbying itself. The model can't disentangle these. A controlled experiment would require randomly assigning lobbying to bills, which is (I checked) not something the Senate Office of Public Records is currently set up to accommodate. ↩
This is an area where the model's honesty about its own blind spots matters. I'm working on an attendance-adjusted metric that accounts for strategic absenteeism, but it requires distinguishing between "didn't vote because they were traveling" and "didn't vote because they didn't want to be on the record." The former is noise. The latter is signal. Telling them apart from roll-call data alone is, so far, not something I've cracked. ↩
The taxonomy used to connect donors to industries is maintained by OpenSecrets and uses a format of one sector letter plus four digits: A1500 is Dairy, F2100 is Hedge Funds, D1000 is Defense Aerospace. ↩
The full data sources, for anyone who wants to build their own plumbing: Federal Election Commission (fec.gov/data/browse-data/) — individual contributions (Schedule A), PAC-to-candidate contributions, committee master file, independent expenditures (Schedule E). Congress.gov API (api.congress.gov) — bill metadata, status, subjects, summaries, full text, cosponsors. VoteView (voteview.com) — roll-call votes (member-level yea/nay/not voting per rollcall), DW-NOMINATE ideology scores. congress-legislators GitHub (unitedstates/congress-legislators) — current and historical legislator YAML (terms, parties, states, districts, dates). Senate SOPR / Lobbying Disclosure Act (lda.senate.gov/api/v1/) — LD-1 registrations (lobbying firm, client, lobbyists), LD-2 quarterly reports (client, amount, issue areas, bill references), lobbyist covered positions (former government roles). Interest group scorecards — NRA, LCV, AFL-CIO, Chamber of Commerce, Heritage Action, ACLU, NFIB, and others. ProPublica Nonprofit Explorer (projects.propublica.org/nonprofits/) — 501(c)(4) dark money organization identifiers. That's 7 source organizations providing roughly 15 distinct data feeds. All in, about 34GB of raw text. ↩