Richard Petti
Data Engineer, Data Scientist and Physicist
Data Engineer from Long Island, New York with a strong focus on modern libraries, frameworks, and tooling to build ETL pipelines and data services.
About Me
I am a Data Engineer, building ETLs and data solutions with modern cloud architecture and tools. Currently I am working in the alternative data space at Knoema after being acquired from my previous company Adaptive Management. I lead the team responsible for building and maintaining ETL pipelines, keeping data flowing to our clients from over 40 data providers (and counting). The breadth of data encountered is one of the exciting things about this role. Technologies used include Nifi, SQL, AWS s3, and Zeppelin Notebooks. Additionally, I provide Data Engineering as a Service for clients, building custom data pipelines and related supporting infrastructure. The main technology and libraries used here include Python and Dask. Finally, I have worked on an entity resolution service, providing a system to map incoming company identifiers (names and/or tickers) provided in the data to our internal company index using our company knowledge graph stored in Neo4j , with information indexed via ElasticSearch .
In past roles, I have worked as a Backend Engineer on the business platform team at CA Technologies, working to make scalable, highly available, intelligent, self-service web applications supporting business operations. I have also worked as a Data Scientist at a publishing company Rodale, focusing on determining the main revenue drivers for the digital side of the business (advertising, web products, e-commerce).
My current success in industry is based on experiences working as a Physicist and Researcher, analyzing big data collected from nuclear colliders and designing the next generation of collider experiments.
My strengths include software development in Python, data wrangling, building data pipelines, and working with graph databases (specifically Neo4j). In short, I just like to figure stuff out and am good at doing so. Flexible, agile, able to adapt and change.
Skills
Python
Pandas
Scikit-learn
Dask
Flask
Django
C++
ROOT
Neo4j
MS SQL Server/Data Warehouse
Postgres
Azure
AWS
ElasticSearch
Nifi
Git
Full History
Projects

Citi Bike Helper App

This project was my Capstone project from my time at The Data Incubator. It invovled created an applicition to peruse data from Citi Bike Sharing in NYC and included a model to predict ride demand in particular areas of the city.

Python
Flask
Heroku
Pandas
Scikit-learn

Personal Finance Tool

This was just a fun side project to help build a tool for my family to use to keep track of personal expenses.

D3
HTML
PHP
MySQL
WAMP server

Integrating the Design of Detectors and Collider Interaction Region for an Electron-Ion Collider

This is the main project I worked on as a post-doc at Brookhaven National Lab under a grant by the US DOE. The scope was to design critical detector components

C++
Python
EicROOT
Condor
GEANT
Monte-carlo simulations

Direct Photons as a Probe of Heavy Ion Collisions

This project was the crux of my PhD dissertation, where I made a unique contribution to the scientific body of information on the Quark Gluon Plasma. I developed a novel photon identification technique which allowed first time access to the energy range measured. The technique I pioneered was applied to many other data sets by students and researchers following my lead.

C++
ROOT
Condor
GEANT
Monte-carlo simulations
Fourier analysis

Get in Touch