Responsible for implementing solutions that cover data sourcing, pipeline engineering, persistence, and presentation capabilities.
Design and build high performing and scalable data pipeline platform using Hadoop, Apache Spark, Apache Kafka.
Design and build near-real-time Data warehouse using hudi, Iceberg, spark 3 , flink , clickhouse , etc.
Design and build Data servers using micro-service , Elastic Search , Container-based architecture .
Change Management process , Monitoring, Alerting and Ticketing for Data Lakes
Build robust and scalable data infrastructure (both batch processing and real-time) to support needs from internal users
Knowledge, Skills and Education:
4+ years' experience in building large scale big data applications
Strong software development, problem-solving and debugging skills with experience in one or more of the following languages: Java, Python, Scala, shell
Experience with multi-threading, concurrency, and highly scalable microservice and REST web services
Experience Hadoop ecosystem tools relevant for real-time and batch data ingestion, processing and provisioning using tools such as Nifi, Kettle, ELK stack ,hive, spark, flink, Azkaban, airflow, clickhouse, vertica