NLP × Finance

Decoding Wall Street

How 40,000 news headlines reveal sentiment patterns, topic structure, and market signals across America's biggest companies.

0
Articles
0
Companies
0
Sectors
0
Months
0
LDA Topics
From Raw API Data to Insight
A four-stage NLP pipeline built on Finnhub's financial news API
01

Collect

40,112 articles via Finnhub API across 15 companies in Technology, Finance, Healthcare, Consumer, and Industrial sectors.

Finnhub APIPython
02

Clean

Regex normalization, stopword removal (ISO + finance-specific), WordNet lemmatization. TF-IDF feature extraction per sector.

NLTKscikit-learn
03

Analyze

Dual sentiment scoring with VADER and TextBlob. LDA topic modeling with 5 topics, 500 iterations on 39,293 documents.

VADERTextBlobLDA
04

Connect

OLS regression linking sentiment and topic features to monthly stock returns and quarterly earnings surprises.

statsmodelsOLS
The Mood of the Market
VADER compound scores reveal a slightly optimistic financial media — and meaningful variation across companies
39.9%
Positive
42.6%
Neutral
17.4%
Negative

Average VADER Sentiment by Company

Key Sentiment Insights

r = 0.37
VADER vs TextBlob correlation — moderate agreement confirms they capture different dimensions of sentiment
0.105
Overall mean VADER compound — slight positive skew typical of financial news media
AMZN → AAPL
Most positive (0.146) to least positive (0.073) — wide 2× range across companies
What Wall Street Talks About
LDA uncovered 5 interpretable topics that map cleanly to real business domains
T0
15.5%
Corporate / Retail
johnson, walmart, amazon, launch, business, service
T1
16.5%
Political / Trade
trump, boeing, tariff, president, deal, trade
T2
23.2%
Market / Trading
earnings, dow, wall street, S&P, trading, investor
T3
24.8%
Banking / Investment
dividend, bank, fund, goldman, sachs, portfolio
T4
20.1%
Big Tech / AI
nvidia, meta, apple, alphabet, chip, intelligence

The Shifting Narrative — Topic Trends Over 12 Months all p < 0.001

Connecting News to Returns
Topic probabilities are stronger return predictors than sentiment scores alone

Positive Correlations with Monthly Returns

Banking / Investment Topic
0.90
Earnings / Trading Topic
0.84
% Positive Headlines
0.82
VADER Mean Sentiment
0.51

Negative Correlations with Monthly Returns

Market / Wall Street Topic
−0.83
Political / Trade Topic
−0.67
Big Tech / AI Topic
−0.67
Volatility
−0.51
Key Takeaways
01

Sentiment Signals Are Real

Positive headlines genuinely correlate with higher stock returns. Financial news isn't just noise — it captures market psychology.

r = 0.82 (% positive → returns)
02

Topics Predict More Than Tone

Topic modeling outperforms sentiment as a return predictor. What news covers matters more than whether it sounds positive or negative.

r = 0.90 (banking topic → returns)
03

News Reacts, Doesn't Predict

Pre-earnings sentiment has near-zero predictive power. Markets are efficient — the earnings surprise is truly surprising.

r = −0.09 (pre-earnings → surprise)
Bottom Line

NLP reveals meaningful structure in financial news. Topic modeling outperforms sentiment analysis as a signal for stock returns.

Finnhub API Python VADER TextBlob LDA TF-IDF scikit-learn statsmodels OLS Regression Mixed Effects Model NLTK pandas seaborn matplotlib