We are looking for a Big Data Engineer who will help us build data pipelines within various data zone on Big Data Lake using the Hortonworks Big Data Stack. Your primary focus will be in building data pipelines using Spark and Python and Kafka streaming and bring petabyte size data into the Data Lake. Also experience in complex data ingestion in a very systematic way by doing data registration, format matching, Serde, metadata preservation and management is required. Essentially anyone who has done some work on building data pipelines that provide complete metadata, handles data quality issues, error management for the ETL pipelines, build and preserve data lineage and data linkages.
Develop ETL/Data Integration pipelines among various source systems and data warehouse
Use ETL Tools such as Spark/Python on Big Data, Kafka streams and Ab Initio on traditional Data Warehouse to design, develop, test and implement data pipelines that are developed as part of the application functionality
Understand the existing data flows and recreate or design new data flows, graphs, plans and implement them in production
Manage and maintain existing and new business rules using ETL or rules engine and test and integrate them in the data pipelines
Work with source side teams in understanding data models, and source data requirements to create staging and final Data Lake Data models and create HLDs, LLDs for the required data models
Use SQL skills to query data and understand the data relationships. Also use ad-hoc querying skills to understand the data flows, data transformations and data reconciliation, validation
Test the data pipelines in development and QA environment
Consult and work with multiple teams on daily basis to uncover business needs, data integration and validation requirements.
Help the deployment team in moving production grade assets into the production region
Define and set standards for the entire ETL and data integration environment
Qualifications: Any Graduate degree but Computer Science preferred
Desired Skills and Competencies:
7 - 12 years of solid hands-on experience in ETL and data warehousing space, data architecture and data management space.
Must have recent 5 to 7 years of hands-on experience in Big Data technologies such as Hadoop, Spark (PySpark), Sqoop, Hive, Kafka, Spark Streaming, Atlas, Falcon, Ranger with knowledge in developing UNIX and python scripts
Must have experience on Hortonworks Big Data Stack
Must have hands-on experience on PySpark, Kafka and Spark Streaming technologies for building data pipelines and moving data within Data Lake.
Must have hands on experience on Hive, Impala and other Big Data technologies.
Strong experience in Data Integration and ETL/ECTL/ELT techniques
Must have hands-on experience in building data models for Data Lakes, EDWs and Data Marts using 3NF, De-normalized data models, Dimensional models (Star, Snowflake, Constellations, etc.)
Should have strong technical experience in Design (Mapping specifications, HLD, LLD), Development (Coding, Unit testing) using big data technologies
Should have experience with SQL database programming, SQL performance tuning, relational model analysis.
Must have the ability to relate to both business and technical members of the team and possess excellent communication skills.
Should be able to provide oversight and technical guidance for developers on ETL and data pipelines
Must have good communications skills and should be able to lead meetings, technical discussions and escalation calls
Must have good documentation skills and must be well versed with tools such as Word, Visio, Power Point, Web portals, etc.