Description:Project Description :
· Evaluate, extract/transform data for analytic purpose within the context of Bigdata environment.
· Responsible for using Hive and spark for data warehouse applications to maintain large datasets in AWS S3 and decide on engineering tools based on recommendations.
· Contribute to ETL and aggregate design.
· Design and develop spark scripts to gather data insights as per business requirements and collaborate with other teams on integration needs/design.
· Facilitate or performs application support, problem solving, and issue resolution with internal and external resources. Contributes and reviews recommendations for technical solutions.
· Resolve big data issues and determine options for issue resolution and risk mitigation.
· Ensure to use components such as sqoop, hive, spark for development. use open source tools such as Airflow, Genie, EMR and Cloudera Hadoop distribution for the tasks assigned.
· Review and approve performance test results, recommendations, and tuning results. Oversee and is responsible for the creation of test plans, test execution, and validation of test results.
· Responsible for EMR Cluster creation, administration, sizing and configuration.
· Development and unit testing on Hadoop and AWS ecosystem.
Automate and monitor the ETL process and applications.
· 5+ years of work experience in Data Warehouse/ BI environment.
· 2+ years of experience working with Hadoop, Hive, Spark
· Solid understanding of general BI and ETL concepts.
· Expert level experience in SQL.
· Working knowledge in Unix/ Linux environments. Shell scripting.
· Experience with Pyhon.
· Familiar with AWS technologies like S3, EMR.
· Familiar with Airflow.
· Understanding and familiarity with visualization tools like Tableau.