layout: post title: “Research Paper Valuation Ratios, Surprises, Uncertainty or Sentiment: How Does Financial Machine Learning Predict Returns From Earnings Announcements?” date: 2025-06-20 category: trading —————–

Authors

Matthias Schnaubelt, Oleg Seifert Affiliation: University of Erlangen-Nürnberg, School of Business, Economics and Society Year: 2015

Abstract Summary

This paper explores how financial machine learning (ML) models, particularly random forests, can predict abnormal returns around earnings announcements. By leveraging over 45,000 earnings announcements from S\&P1500 firms, the study integrates a variety of data sources including analyst forecasts, earnings press releases, and conference call transcripts. The authors categorize features into valuation ratios, forecast errors, information uncertainty, and sentiment polarity. The model outperforms linear benchmarks and delivers a viable zero-investment trading strategy with an annualized return of 11.63% and Sharpe ratio of 1.39 after transaction costs.

Key Ideas and Comments

Page 14: Table 3 - Features

Table 3 categorizes 54 features used in the predictive model into four groups:

  1. Valuation Ratios (VR): Ratios like earnings-to-price, sales-to-price, and book-to-market derived from recent accounting data.
  2. Forecast Errors (FE): Surprises calculated as the difference between actual and mean analyst forecast, scaled by price.
  3. Uncertainty and Information Quality (UIQ): Measures such as number and variance of analyst forecasts, and conference call lengths.
  4. Textual Sentiment Polarity (POL): Sentiment scores from conference call transcripts using the Loughran-McDonald finance dictionary, segmented by topic and speaker.

This table is a goldmine for any researcher or practitioner aiming to build sentiment or earnings-based ML alphas. It provides data definitions, transformation methods, and citation support.

Page 23: Trading Strategy Performance

A zero-investment trading strategy is implemented using top/bottom decile predictions from the model (T = 5 days horizon):

  • Capital Allocation: Only 20% of capital is allocated daily, spread evenly across selected signals.
  • Returns: The strategy yields annualized returns of 11.63% with a Sharpe ratio of 1.39, even after conservative transaction costs.
  • Market Neutrality: Positions are hedged using the iShares Core S\&P Total U.S. Stock Market ETF (ITOT).

This shows the economic utility of ML predictions beyond just statistical significance.

Data and Datafields Used

  • Earnings Data: From I/B/E/S, including earnings-per-share, revenue, book-value, dividend, and cashflows.
  • Price Data: From Thomson Reuters Eikon, adjusted for dividends and splits.
  • Conference Call Transcripts: Sourced from Thomson Reuters and Seeking Alpha, parsed into structured segments.
  • Analyst Forecasts: Including means, standard deviations, number of revisions and dispersion measures.

Potential Issues with Painstaking Details

  1. Feature Overfitting Risk

    • With 54 features, the model may capture noise instead of signal, especially in small-cap stocks with sparse data.
    • The use of high-dimensional categorical encoding (e.g., decile bins) can exacerbate overfitting unless regularized effectively.
  2. Survivorship Bias in Data Selection

    • Even though the authors restrict the sample to S\&P1500, data from firms that left the index mid-period might be incomplete.
    • Firms without consistent transcript data (especially small-caps) may skew the training set toward more stable, well-covered firms.
  3. Timing and Look-Ahead Bias

    • Using features like analyst forecast dispersion near the event date may inadvertently leak future information if timestamps aren’t perfectly aligned.
    • The reliance on press release and transcript analysis assumes near-instant processing, which may not reflect real-world delays in dissemination.
  4. Generalizability Beyond U.S. Markets

    • The entire dataset is U.S.-centric (S\&P1500). Market microstructure and analyst behavior differ significantly in other regions.
    • Sentiment dictionaries and behavior assumptions (e.g., same-day calls) may not hold in Europe or Asia.

Reflection

This paper is an excellent synthesis of financial ML and event-driven investing. It provides a rigorous and holistic view on modeling earnings announcement reactions and serves as a practical roadmap for constructing ML alphas. The clear structure, extensive reference to prior literature, and strong empirical results make it valuable for both researchers and practitioners.

Reference