Data Engineer

  • Full-time

Company Description

e.ventures is a leading internet venture capital firm with offices in San Francisco, Hamburg, Moscow, São Paulo, Tokyo and Beijing. With portfolio companies such as NGINX, Recurly, Groupon, Angieslist, Sonos, Yume Networks, Pulse, delicious and shopping.com, an entrepreneurial and ambitious team, as well as funds in five different geographies, e.Ventures is uniquely positioned to invest successfully in early-stage internet startups around the world.

Job Description

Build and maintain an infrastructure and user interface for DailyGieselmann.com, which serves as a tool for early trend and startup discovery by our investment team.  

 

Specifically this involves the following:

- Creating software for retrieving, parsing and processing structured and unstructured data from the web and commercial data providers.
-Building indexes and feature extraction software.  

-Creating and maintaining automatic HTML reports and visualizations, and interactive JavaScript based forms.  

-Maintain a small in-house data infrastructure running on PostgreSQL, C++ and your scripting language of choice.  

-Maintain and expand a small amount (less than 3000 lines) of existing web page generation scripts, written in PHP.

Qualifications

Applicants must be familiar with:

- The details of the HTTP protocol

- Linear regression

- Numerical methods

- PostgreSQL

Applicants must be able to implement high-performance data structures in low-level languages.


A bit more background about the role from one of the engineers:

How I’ve Helped VC Firm e.ventures Discover High Growth Internet Companies

For the past 2½ years, I have been working with the VC firm e.ventures on developing a system to identify continuously growing websites and mobile apps.  I’m originally from Oslo, Norway, where I worked as a freelancer in video game, web and mobile app development.  In the fall of 2010, I joined e.ventures as a freelancer and in 2012 I became full-time employee at their office in San Francisco.  My responsibilities as a data engineer have been focused on collecting data from various web sources and moving the information into a highly schematized PostgreSQL database.  The objective of my work was to build an infrastructure and a user interface for DailyGieselmann.com, which would serve as a tool for early trend discovery by our investment team.  I believe data mining is one of the last frontiers of computer science, so this has been right up my alley.

What I’ve been building here is akin to a search engine, so I’ve had the opportunity to work on a variety of problems:

 . a web crawler, which serves as an input to the entire process,

 . 30 small programs written in C++ and SQL for converting the crawler data into a usable format,

 . a tool based on WebKit for capturing screenshots and extracting the normalized HTML from websites,

 . cantera-table, a database designed specifically to solve the problems of storing inverted indexes and time series data,

 . an index builder, which extracts technical indicators from our time series data and combines that with the features discovered by the various crawler processing scripts (features include language and technology use, CrunchBase categories and much more),

 . the web interface itself, which mostly uses JavaScript and PHP.  Even after we have applied filters to the input data, we aren’t left with a short list of excellent investment opportunities.  Hence we need a user interface for the investment team to be able to judge sites efficiently.  Large screenshots help with this, by reducing the need to navigate to each individual website.

I’ve had a great experience being the coder for DailyGieselmann.com.  Working closely with one of the General Partners and the investment team, I have been able to help the team find valuable, usable results to support their investment decisions, which support or even lead to investments in great companies like AppAnnie and Munchery, one of my personal favorites ;).  I enjoyed the culture at e.ventures, which is professional, fun and entrepreneurial.    

e.ventures / DailyGieselmann.com background information

San Francisco, March 22nd, 2013 - Venture Capital firm e.ventures is launching the latest version of DailyGieselmann.com, a tracking system created to discover growing websites and mobile apps worldwide. Since 2009, General Partner Tom Gieselmann has created an extensive process around identifying high growth comanies with the help of big data. He shares his basic research with the public. The main goal is allowing the community to discover the most consistently growing websites and mobile apps and learn from their success. Entrepreneurs, Hackers, Product Managers and Journalists can find up and coming trends, hot companies and identify patterns of successful websites or apps. By providing visibility into what is growing each day, we hope that this resource serves as an inspiration to individuals all over the world, to help them combine elements of successful sites and thereby increase the pace of innovation.

The site is built on top of the infrastructure e.ventures has developed internally to analyze large amounts of data with the help of machine learning, classifiers and time series analysis. The web data collected includes website traffic, iTunes app ranks, iTunes books and movie downloads, as well as Crunchbase information. The service displays a dynamic list of the most consistently growing websites and apps and is updated on a daily basis. The UI enables the user to scan through thousands of websites within a couple of hours, to identify areas of interest.

In analyzing a website’s growth, it is important to distinguish between fast and high growth. While fast traffic growth often can be of short duration, consistent growth is usually evidence for a great product with a sustainable value proposition. The challenge was to differentiate the two of them in a scalable way, to make sure the companies showcased in the research are actual trends rather than short-term fads.

e.ventures is using the system on a daily basis in all of its global offices, where it consistently adds to the team’s education and identifies a significant number of potential startup investments that e.ventures is reaching out to every week. Interestingly enough, one of the first deals that resulted out of this systematic data analysis is a company in exactly that space: AppAnnie, which is the global leader in providing app store analytics of over 10,000 app publishers.  Marshall Nu, COO at AppAnnie remembers: “When we first met Tom, he showed us the in-house app tracking features on his site.  As big data fans we were very impressed by him and his team's knowledge and understanding of the app space, as well as the obvious value of analytics and market intelligence. This was exactly the kind of investor we wanted to partner with.”

You are still reading.  :)  Thank you.  Here is a bit more information:  

http://techcrunch.com/2013/03/20/e-ventures-daily-gieselmann/

or, you could just go and check out some of the data that we make publicly available at dailygieselmann.com

Additional Information