Parser Getting Started

Feb. 8, 2022

Parser as a Service Background

Empirasign first introduced our Parser as a Service in 2017 to provide clients with direct access to our market-leading, proprietary parsing technology via the simplicity of an API call. In the Extract, Transform, and Load process for message handling, the heavy lift is the Transform step. Our Parser as a Service enables clients to off-load this step to an outside vendor whose long-standing core competency is transforming unstructured data into structured form. Our Freemium Service Tier made it easier than ever to get started. However, technical hurdles remain.

Inbox Parsing Demo Scripts

Having removed the cost hurdle, we aim to dispense with any remaining technical hurdles with the introduction of two demo parsing scripts--one for IMAP and one for Outlook.

Pseudo Code for Demo Scripts

connect to email INBOX (IMAP or Outlook)
for message in INBOX:
 parse message via API call
 save parsed results to disk as JSON blob
 if remaining quota OK
   rinse/repeat
 else
   exit loop

IMAP Script (Cross-Platform)

https://gist.github.com/empirasign/8eac751354e8cc0c9d0318e7a82005f6

Outlook Script (Windows only)

https://gist.github.com/empirasign/8237e8e68243e9f2532e88cf17cb84ca

Common Dependencies

If you use the Outlook version, pywin32 is also required (Windows only).

The example scripts send runs to our corporate runs parser endpoint. Other endpoints include:
/v1/parse-run/ (Structured Products runs)
/v1/parse-bwic/ (Structured Products BWICs)
/v1/parse-corp/ (Corporate Bonds and Muni runs)
/v1/parse-loan/ (Levered Loans runs)

Link to full docs: www.empirasign.com/api-docs/

Parser GUI

A GUI is available to replay and view any previously parsed message. As we don't permanently store messages, this replay feature is only available for two weeks after an API call is made. You can disable this temporary data storage by sending along store=False as a parameter.

Example GUI replay and debugging URLs:

https://www.empirasign.com/parse-run/{tx_id}/
https://www.empirasign.com/parse-bwic/{tx_id}/
https://www.empirasign.com/parse-corp/{tx_id}/

Each API call returns a tx_id in the response object.

For more information on the GUI replay feature, please consult this blog post:
www.empirasign.com/blog/Parser-Replay-Viewer/

A few sample screenshots are below:

Parser Validated Records

Freemium

Contact us to get started today with Empirasign's Parser as a Service. The Freemium Service Tier allows for 100 messages parsed per day (BWICs or Runs).

Getting an API Key

If you're already an Empirasign customer, it's best to get in touch with your account representative. If not, send an email to info@empirasign.com with Parser in the subject line.

More to come

In 2022, we'll be doing these technical user and developer-focused tutorials on a regular basis.



Browser-Based Excel Cleaner

Dec. 22, 2021

Excel Errors are Everywhere

The London Whale was a spreadsheet formula error. At the London Olympics, they sold more tickets than seats due to a spreadsheet data-entry error. COVID cases were miscounted due to Excel 97-2003 (xls) row limits. In genomics, auto-correct errors have become so endemic that scientists have changed gene names rather than tangle with Excel auto-correct settings. If you search Google News for "spreadsheet errors" you will find examples affecting financial services in the last few business days (as 12/22/21).

Excel as a Database

In finance (and many other domains), Excel is commonly used as an ad-hoc database. A primary source of errors for this off-label use occurs when Excel auto-corrects text to dates or numbers. Today, we're introducing a linting tool that recognizes and reverses these sort of errors.

Problems Solved by our Excel Cleaner

The CUSIP Problem

Any CUSIP that is all numbers or has an E near the end will be auto-corrected by Excel to a floating point number in scientific notation. Click here to see the exact CUSIP pattern that results in this error. Approximately 15% of all N-PORT filings have one or more positions whose CUSIP will be auto-corrected by Excel into a number. For NAIC transaction data, approximately 51% of life, property and casualty insurers have one or more transactions in such securities. Empirasign maintains a comprehensive list of such problem bonds here: www.empirasign.com/cusip-excel-rosetta/.

The 32nds Problem

Prices in Structured Products (and a few other areas of US Fixed Income) are still quoted in 32nds. (Thank you, England!) For bonds near par, Excel will successfully accept them. For example, if you key 99-31 into a spreadsheet or load a CSV with such a value, it will remain as 99-31. However, for low dollar price bonds, such as IOs, they may be auto-converted to dates. For example, if you enter 11-02 into Excel, it will be converted into November 2nd of the current year. The conversion will even occur for dates that make no sense. For example, 4-31 will be converted to April 1, 1931! This will also happen for February dates above 28. These occur quite frequently for specs whose pay-ups are around 3 points.

Find the Empirasign Excel Cleaner at: www.empirasign.com/excel-cleaner/

How to Use

After selecting a file to scan, our linter will get to work. All data processing takes place entirely within your browser (see Data Safety below). If our linter detects fixable errors, they will appear in two sections in the lower part of the page (see images below). The table on left will enumerate each error, its location, and a proposed fix. The section to the right is auto-generated VBA code to apply these fixes. To run these fixes, copy this code into a new VBA module on your spreadsheet, and then execute the LintWorkbook macro. All coded corrections are commented for easy inspection before macro execution.

Linter Found Errors

Linter VBA Fixes

Data Safety

None of the data in your spreadsheet will leave your browser and hit any remote servers. To verify, hit F12 (Developer Tools) on your browser and watch the Network tab.

Other Resources

  • ExceLint: Automatically Finding Spreadsheet Formula Errors [paper] [github]
  • CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features [paper]
  • Spreadsheets For Developers [talk] [slides]
  • Matt Parker on COVID case count errors [video]


Parser Replay Viewer

Nov. 23, 2021

Empirasign first introduced our Parser as a Service in 2017 to provide users with access to our market-leading, proprietary parsing technology via the simplicity of an API call. In the Extract, Transform, and Load process for message handling, the heavy lift is the Transform step. Our Parser as a Service enables clients to off-load this step to an outside vendor whose long-standing core competency is transforming unstructured data into structured data.


Humans & APIs

Human interaction with APIs can be a challenge -- especially when dealing with large, unstructured data objects such as email messages. To make onboarding and reconciliation easier, our parser has long returned (along with parsed results) a unique identifier. We call this a tx_id. This identifier uses the universally unique identifier format UUID4. Empirasign-client communications can reference this identifier instead of emailing back and forth large blobs of data. But what if we could do better?


Replay GUI

Using the tx_id, any client can view the parsed results of an email they had previously submitted. Easily confirm expectations vs. results and quickly file bug reports, all on one page. See screenshots below.


Report Aggregate Section

This sections show summary stats and code revision information. There is also a button to file a bug report that will automatically link to the tx_id.

Report Aggregate


Validated Results Section

Validated Results


HTML Email Body Section

Most emails are transmitted via the multipart/alternative or multipart/mixed subtype of the MIME specification. These subtypes generally allow for two message bodies--one nicely-formatted (text/html), and one pure text (text/plain). The example screenshot below of a runs message shows the nicely-formatted message body that you generally see today. We also capture and process the text/plain message body if that section is more informative.

HTML Email Body


Other sections of the report (not shown here), include raw data acquired (before normalization and validation), text email body, attachments, and "Export to CSV" buttons.


Value Prop

How does this differ from a regular JSON Pretty Printer? Most notably, our reports are constructed how our clients expect to see data organized (e.g. bid to the left of offer, and separate sections for each part of the process). However, the biggest difference from a generic Pretty Printer is that each time the page is loaded, the parser is re-run -- hence the "replay" name. As the page is dynamic, the client can quickly confirm if any recently applied code changes have affected data yield. The dynamism is the power in this tool.


Data Privacy

To allow for this replay functionality and for debugging purposes, all parsed messages are stored for two weeks. After which, they are permanently deleted. To opt out of this functionality and its necessary ephemeral data storage, simply pass along the boolean replay=false with each parse API call.


Feature Status and Road Map

Currently available:
• Corporates and Munis: https://www.empirasign.com/replay/parse-corp/{tx_id}/
• BWICs: https://www.empirasign.com/replay/parse-bwic/{tx_id}/

Set to release in the next few business days:
• Structured Products: https://www.empirasign.com/replay/parse-run/{tx_id}/


Freemium

Get started now with Empirasign Parser as Service. Our no-obligation Freemium service tier allows for 100 messages parsed per day (BWICs or runs). Contact us to get started today.


View our Parser as a Service docs here.



New York State Registered Hedge Fund IARs

Nov. 18, 2021

A list of New York-registered hedge funds and their representatives who are specifically licensed to do advisory work in New York is now available on the Empirasign web app.


Why is this new?

This information has never been publicly available before because representatives of investment advisers and solicitors were not required to register with the state of New York prior to February 1st of this year. Revisions to Title 13, part 11 of the New York Code of Rules and Regulations adopted in December 2020 introduced new registration and examination requirements for Investment Adviser Representatives (IARs) who meet certain conditions -- generally, six or more NY State domiciled investors and 10% or more of AUM being controlled by individual investors. The new rules went into effect on February 1st, with a registration compliance deadline of August 31st. Examination requirements must be completed by December 2nd.

A press release from the NY Attorney General explains that these changes were enacted in order to modernize and simplify the state's securities registration process, as well as to help protect New York investors from fraud - which became significantly more prevalent as a result of the COVID-19 pandemic.


Why is this information useful?

Prior to these changes, the contact information for portfolio managers and analysts at SEC-registered Hedge Funds in New York was largely unavailable, as they generally did not have to register as IARs. Previously, one would have to rely upon a hodge-podge of personal and internal tools or subscribe to error-ridden commercial personnel database products in order to ascertain these contacts. For those who have met the criteria described above, this is no longer the case. Verifiable names, addresses, contact information, and place of employment for these IARs is now published in an easy-to-use format on the Empirasign web app.


Where is this information sourced from?

The application for initial registration as an investment adviser is made by filing Form ADV with the CRD/IARD system. These forms disclose information including a firm's AUM and compensation structure, allowing us to identify firms which meet the criteria of a hedge fund.


Resources:



Largest Holders by Issuer

Oct. 25, 2021

Discover which funds hold the most valuable positions from any issuer or deal(s) with our Largest Holders by Issuer feature. This tool aggregates and ranks the market value of each fund's portfolio holdings based on our N-PORT Filings database, which provides the most up-to-date holdings data available via the SEC. Simply enter any issuer ticker, CUSIP, or ISIN, select which deals you wish to view, and hit Submit.

Largest Holders by Issuer

The search results table can be sorted by any column (Fund Company, Fund Name, Value Date, Number of Positions, or Market Value). All search results tables can be exported by clicking on the green Excel icon.

Click on any Fund Name to view a complete list of all holdings records for a particular fund from our N-PORT Filings Database. Fund Holdings Search results will include:

  • Complete current and historical portfolio holding records.
  • Directors and officers for the fund company.*
  • Registered representatives for the advisers of the fund.*
  • All pricing services used by the fund.*
  • A visualization of how the fund's assets are allocated across sectors according to their current report, as well as previous filings.
  • Accession Numbers hyperlinked to the original form N-PORT and N-CEN filings on the SEC EDGAR website.*

These results can also be exported via the green CSV icon, located at the top of the page.

*Derived from Form N-CEN filings. See our blog post on how we link bonds to people here.


Previous  |   1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    |  Next