# U.S. Treasury Discount Bond Database

# Discount Bond Database

This reference dataset for treasury yields is based on the method in our paper "Stripping the Discount Curve - a Robust Machine Learning Approach". We introduce a robust, flexible and easy-to-implement method for estimating the yield curve from Treasury securities. This method is non-parametric and optimally learns basis functions with an economically motivated smoothness reward. We show in an extensive empirical study on daily U.S. Treasury securities, that our method strongly dominates all parametric and non-parametric benchmarks. Our method achieves substantially smaller out-of-sample yield and pricing errors, while being robust to outliers and data selection choices. We attribute the superior performance to the optimal trade-off between flexibility and smoothness, which positions our method as the new standard for yield curve estimation.

The reference dataset for discount bond returns and factors is based on our paper "Shrinking the Term Structure". We introduce a conditional factor model for the term structure of treasury bonds, which unifies non-parametric curve estimation with cross-sectional asset pricing. Our robust, flexible and easy-to-implement method learns the discount bond excess return curve directly from observed returns of treasury securities. This curve lies in a reproducing kernel Hilbert space, which is derived from economic first principles, and optimally trades off smoothness against return fitting. We show that a low dimensional factor model arises because a sparse set of basis functions spans the estimated discount bond excess return curves. The estimated factors are investable portfolios of traded assets, which replicate the full term structure and are sufficient to hedge against interest rate changes. In an extensive empirical study on U.S. Treasuries, we show that the discount bond excess return curve is well explained by four factors, which capture polynomial shapes of increasing order and are necessary to explain the term structure premium. The cash flows of coupon bonds fully explain the factor exposure, and play the same role as firm characteristics in equity modeling. In this sense, ``cash flows are covariances''. We introduce a new measure for the time-varying complexity of bond markets based on the exposure to higher-order factors, and show that changes in market complexity affect the term structure premium.

The data is updated regularly. Last update: September 7, 2023.

Our data provide estimates of annualized continuously compounded zero-coupon yields. Datasets come in four combinations of two frequencies (daily and monthly) and two granularity levels (daily and monthly). Datasets at monthly frequency report the yields for the last day of every month. We annualize the daily yields with the usual 365-day year convention, but given our estimates it is straightforward to modify this normalization. For details on the convention adopted to calculate monthly maturities, please visit our Frequently Asked Questions page. The sampling period starts on June 14, 1961.

In summary, we have the following four datasets: (1) Yields at daily frequency for daily maturities, (2) Yields at monthly frequency for daily maturities, (3) Yields at daily frequency for monthly maturities, (4) Yields at monthly frequency for monthly maturities.

The first column in each dataset is observation date in format YYYY-MM-DD. The second column is maximum maturity of securities on the observation date, which sets the boundary for the extrapolation region. Titles of the remaining columns are time-to-maturity in unit of the corresponding granularity of the dataset. 10-Year, 20-year, and 30-year yield estimates are provided starting from June 14, 1961, January 5, 1973, and November 25, 1985 respectively.

More details on the construction are in "Stripping the Discount Curve - a Robust Machine Learning Approach".

Our data provides daily returns and excess returns of zero-coupon bonds. The zero-coupon bonds are tradable portfolios of U.S. Treasuries. The first column in each dataset is the observation date in the format YYYY-MM-DD. We assume the usual 365-day year convention. Returns at daily frequency represent one business day returns such that the return on day t is the return relative to business day t+1. The sampling period starts on June 14, 1961 and includes maturities up to 10 years.

We have the following five datasets: (1) returns at daily frequency for daily maturities, (2) returns at daily frequency for annual maturities, (3) risk-free returns at daily frequency, (4) excess returns (returns minus risk-free returns) at daily frequency for daily maturities, and (5) excess returns (returns minus risk-free returns) at daily frequency for annual maturities.

More details on the construction are in "Shrinking the Term Structure".

# Term structure factors

The data provides the excess returns of the four KR factors. We show that these four factors provide an excellent representation of the discount bond excess return curve. Excess returns remove the level (risk-free returns) of discount returns. When modeling discount returns instead of excess returns, the risk-free returns should be added to obtain a five factor representation.

The sampling period starts on June 14, 1961 and provides the daily excess returns of the KR-4 factors and the daily risk-free returns. The details for the KR-4 factors are in "Shrinking the Term Structure" and for the risk-free return in "Stripping the Discount Curve - a Robust Machine Learning Approach".

# Bond Market Condition Measures

The exposure to term structure risk factors provides a measure for the state of the bond market. The cross-sectional variation that is explained by our term structure risk factors is time-varying and can be informative about economic conditions. Therefore, we introduce two novel measures for the state of the bond market. The Idiosyncratic Treasury Volatility (IT-VOL) measures the idiosyncratic volatility normalized by the overall volatility. It captures how hard it is to explain the observed bond returns even with a flexible model. The Treasury Market Complexity (T-COM) measures the complexity of the bond market. It captures how much variation is explained by higher order term structure factors. Our measures of the bond market condition are strongly related to the term structure risk premium.

The data provides the daily IT-VOL and T-COM values starting on June 14, 1961. More details on the construction are in "Shrinking the Term Structure".

We provide GitHub code with examples and documentation of how to use the KR method for yield, return and factor estimation.

# Authors

This reference dataset is the result of the research of Damir Filipović, Markus Pelger and Ye Ye.

Prof. Dr. Damir Filipović

Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Finance Institute

Prof. Dr. Markus Pelger

Stanford University

Dr. Ye Ye

Uber and Stanford University