Browser-Based Excel Cleaner

Dec. 22, 2021

Excel Errors are Everywhere

The London Whale was a spreadsheet formula error. At the London Olympics, they sold more tickets than seats due to a spreadsheet data-entry error. COVID cases were miscounted due to Excel 97-2003 (xls) row limits. In genomics, auto-correct errors have become so endemic that scientists have changed gene names rather than tangle with Excel auto-correct settings. If you search Google News for "spreadsheet errors" you will find examples affecting financial services in the last few business days (as 12/22/21).

Excel as a Database

In finance (and many other domains), Excel is commonly used as an ad-hoc database. A primary source of errors for this off-label use occurs when Excel auto-corrects text to dates or numbers. Today, we're introducing a linting tool that recognizes and reverses these sort of errors.

Problems Solved by our Excel Cleaner

The CUSIP Problem

Any CUSIP that is all numbers or has an E near the end will be auto-corrected by Excel to a floating point number in scientific notation. Click here to see the exact CUSIP pattern that results in this error. Approximately 15% of all N-PORT filings have one or more positions whose CUSIP will be auto-corrected by Excel into a number. For NAIC transaction data, approximately 51% of life, property and casualty insurers have one or more transactions in such securities. Empirasign maintains a comprehensive list of such problem bonds here:

The 32nds Problem

Prices in Structured Products (and a few other areas of US Fixed Income) are still quoted in 32nds. (Thank you, England!) For bonds near par, Excel will successfully accept them. For example, if you key 99-31 into a spreadsheet or load a CSV with such a value, it will remain as 99-31. However, for low dollar price bonds, such as IOs, they may be auto-converted to dates. For example, if you enter 11-02 into Excel, it will be converted into November 2nd of the current year. The conversion will even occur for dates that make no sense. For example, 4-31 will be converted to April 1, 1931! This will also happen for February dates above 28. These occur quite frequently for specs whose pay-ups are around 3 points.

Find the Empirasign Excel Cleaner at:

How to Use

After selecting a file to scan, our linter will get to work. All data processing takes place entirely within your browser (see Data Safety below). If our linter detects fixable errors, they will appear in two sections in the lower part of the page (see images below). The table on left will enumerate each error, its location, and a proposed fix. The section to the right is auto-generated VBA code to apply these fixes. To run these fixes, copy this code into a new VBA module on your spreadsheet, and then execute the LintWorkbook macro. All coded corrections are commented for easy inspection before macro execution.

Linter Found Errors

Linter VBA Fixes

Data Safety

None of the data in your spreadsheet will leave your browser and hit any remote servers. To verify, hit F12 (Developer Tools) on your browser and watch the Network tab.

Other Resources

  • ExceLint: Automatically Finding Spreadsheet Formula Errors [paper] [github]
  • CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features [paper]
  • Spreadsheets For Developers [talk] [slides]
  • Matt Parker on COVID case count errors [video]

Parser Replay Viewer

Nov. 23, 2021

Empirasign first introduced our Parser as a Service in 2017 to provide users with access to our market-leading, proprietary parsing technology via the simplicity of an API call. In the Extract, Transform, and Load process for message handling, the heavy lift is the Transform step. Our Parser as a Service enables clients to off-load this step to an outside vendor whose long-standing core competency is transforming unstructured data into structured data.

Humans & APIs

Human interaction with APIs can be a challenge -- especially when dealing with large, unstructured data objects such as email messages. To make onboarding and reconciliation easier, our parser has long returned (along with parsed results) a unique identifier. We call this a tx_id. This identifier uses the universally unique identifier format UUID4. Empirasign-client communications can reference this identifier instead of emailing back and forth large blobs of data. But what if we could do better?

Replay GUI

Using the tx_id, any client can view the parsed results of an email they had previously submitted. Easily confirm expectations vs. results and quickly file bug reports, all on one page. See screenshots below.

Report Aggregate Section

This sections show summary stats and code revision information. There is also a button to file a bug report that will automatically link to the tx_id.

Report Aggregate

Validated Results Section

Validated Results

HTML Email Body Section

Most emails are transmitted via the multipart/alternative or multipart/mixed subtype of the MIME specification. These subtypes generally allow for two message bodies--one nicely-formatted (text/html), and one pure text (text/plain). The example screenshot below of a runs message shows the nicely-formatted message body that you generally see today. We also capture and process the text/plain message body if that section is more informative.

HTML Email Body

Other sections of the report (not shown here), include raw data acquired (before normalization and validation), text email body, attachments, and "Export to CSV" buttons.

Value Prop

How does this differ from a regular JSON Pretty Printer? Most notably, our reports are constructed how our clients expect to see data organized (e.g. bid to the left of offer, and separate sections for each part of the process). However, the biggest difference from a generic Pretty Printer is that each time the page is loaded, the parser is re-run -- hence the "replay" name. As the page is dynamic, the client can quickly confirm if any recently applied code changes have affected data yield. The dynamism is the power in this tool.

Data Privacy

To allow for this replay functionality and for debugging purposes, all parsed messages are stored for two weeks. After which, they are permanently deleted. To opt out of this functionality and its necessary ephemeral data storage, simply pass along the boolean replay=false with each parse API call.

Feature Status and Road Map

Currently available:
• Corporates and Munis:{tx_id}/
• BWICs:{tx_id}/

Set to release in the next few business days:
• Structured Products:{tx_id}/


Get started now with Empirasign Parser as Service. Our no-obligation Freemium service tier allows for 100 messages parsed per day (BWICs or runs). Contact us to get started today.

View our Parser as a Service docs here.

New York State Registered Hedge Fund IARs

Nov. 18, 2021

A list of New York-registered hedge funds and their representatives who are specifically licensed to do advisory work in New York is now available on the Empirasign web app.

Why is this new?

This information has never been publicly available before because representatives of investment advisers and solicitors were not required to register with the state of New York prior to February 1st of this year. Revisions to Title 13, part 11 of the New York Code of Rules and Regulations adopted in December 2020 introduced new registration and examination requirements for Investment Adviser Representatives (IARs) who meet certain conditions -- generally, six or more NY State domiciled investors and 10% or more of AUM being controlled by individual investors. The new rules went into effect on February 1st, with a registration compliance deadline of August 31st. Examination requirements must be completed by December 2nd.

A press release from the NY Attorney General explains that these changes were enacted in order to modernize and simplify the state's securities registration process, as well as to help protect New York investors from fraud - which became significantly more prevalent as a result of the COVID-19 pandemic.

Why is this information useful?

Prior to these changes, the contact information for portfolio managers and analysts at SEC-registered Hedge Funds in New York was largely unavailable, as they generally did not have to register as IARs. Previously, one would have to rely upon a hodge-podge of personal and internal tools or subscribe to error-ridden commercial personnel database products in order to ascertain these contacts. For those who have met the criteria described above, this is no longer the case. Verifiable names, addresses, contact information, and place of employment for these IARs is now published in an easy-to-use format on the Empirasign web app.

Where is this information sourced from?

The application for initial registration as an investment adviser is made by filing Form ADV with the CRD/IARD system. These forms disclose information including a firm's AUM and compensation structure, allowing us to identify firms which meet the criteria of a hedge fund.


Largest Holders by Issuer

Oct. 25, 2021

Discover which funds hold the most valuable positions from any issuer or deal(s) with our Largest Holders by Issuer feature. This tool aggregates and ranks the market value of each fund's portfolio holdings based on our N-PORT Filings database, which provides the most up-to-date holdings data available via the SEC. Simply enter any issuer ticker, CUSIP, or ISIN, select which deals you wish to view, and hit Submit.

Largest Holders by Issuer

The search results table can be sorted by any column (Fund Company, Fund Name, Value Date, Number of Positions, or Market Value). All search results tables can be exported by clicking on the green Excel icon.

Click on any Fund Name to view a complete list of all holdings records for a particular fund from our N-PORT Filings Database. Fund Holdings Search results will include:

  • Complete current and historical portfolio holding records.
  • Directors and officers for the fund company.*
  • Registered representatives for the advisers of the fund.*
  • All pricing services used by the fund.*
  • A visualization of how the fund's assets are allocated across sectors according to their current report, as well as previous filings.
  • Accession Numbers hyperlinked to the original form N-PORT and N-CEN filings on the SEC EDGAR website.*

These results can also be exported via the green CSV icon, located at the top of the page.

*Derived from Form N-CEN filings. See our blog post on how we link bonds to people here.

Advan Foot-Traffic Data

April 29, 2021

The next several months will be a defining period for the commercial mortgage-backed market. Empirasign subscribers now have the competitive advantage of access to foot-traffic data from geolocation analytics firm Advan Research Corporation to help navigate this critical period of recovery and growth. Advan's geofencing technology aggregates mobile phone usage patterns from specific properties, giving Empirasign users the ability to monitor the foot-traffic patterns observed at hotel, retail, multi-family, and office building locations for up to 95% of US CMBS deals. Click on the Advan Foot Traffic button at the top of eligible CMBS bond profile pages to instantly view a graph of foot-traffic history for the commercial properties underlying a particular deal.

Advan Button Circled Bond Profile Page

Advan Chart BANK 2018_BN15

Users can also view comparison charts of similar deals, with the choice of two statistical approaches. Select "Nominal Comps" to view a comparison that prioritizes deals of similar absolute foot traffic volumes. Select "Normalized Comps" to view a comparison that prioritizes deals of similar foot traffic patterns--regardless of baseline levels. Additionally, one can select or deselect any of the deals shown in either chart type to focus on specific data relationships.

All charts are exportable as a shareable image by clicking on the PNG icon in the top right corner.

1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    |  Next