How 40,000 news headlines reveal sentiment patterns, topic structure, and market signals across America's biggest companies.
40,112 articles via Finnhub API across 15 companies in Technology, Finance, Healthcare, Consumer, and Industrial sectors.
Finnhub APIPythonRegex normalization, stopword removal (ISO + finance-specific), WordNet lemmatization. TF-IDF feature extraction per sector.
NLTKscikit-learnDual sentiment scoring with VADER and TextBlob. LDA topic modeling with 5 topics, 500 iterations on 39,293 documents.
VADERTextBlobLDAOLS regression linking sentiment and topic features to monthly stock returns and quarterly earnings surprises.
statsmodelsOLSPositive headlines genuinely correlate with higher stock returns. Financial news isn't just noise — it captures market psychology.
r = 0.82 (% positive → returns)Topic modeling outperforms sentiment as a return predictor. What news covers matters more than whether it sounds positive or negative.
r = 0.90 (banking topic → returns)Pre-earnings sentiment has near-zero predictive power. Markets are efficient — the earnings surprise is truly surprising.
r = −0.09 (pre-earnings → surprise)NLP reveals meaningful structure in financial news. Topic modeling outperforms sentiment analysis as a signal for stock returns.