This selection process has already been completed.

PRÁCTICAS INGENIERÍA DATOS MADRID

Holcim EMEA Digital Center, S.L.U.

Madrid (Madrid)

T/2023/43589


What does the company offer?
  • 1 internship vacantNone en Holcim EMEA Digital Center, S.L.U. de 12 meses
  • Study assistance of 1,000.00€ gross monthly
  • 8h. diarias en Jornada completa
  • Work Center en Spain: C/ Julián Camarillo 29 - 31 Madrid (Madrid)
  • La beca incluye matrícula en Máster de Formación Permanente en Organizaciones Ágiles y Transformación Digital (Universidad Camilo José Cela)
What profile is the company looking for?
  • University graduate or University postgraduate: Grado en Estadística, Grado en Matemáticas, Grado en Ingeniería de Datos, Grado en Bioinformática y Datos Masivos (Big Data)
  • Language Competencies: level B2 of English.
Proposed training plan

Description

The Data Engineer will join a growing team of analytics experts responsible for expanding and optimizing data, data infrastructure, processing and wrangling. The position will be responsible for creating data integration frameworks, set up data flow
solutions, gather and cleanse data, understands the meaning of datasets and dataset compatibility and implements business rules. The Data Engineer will support data scientists, product managers and analysts on data initiatives and will ensure optimal data delivery is consistent across projects. Data Engineer must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. This role is expected to optimize or even re-designing our data infrastructure to
support our next generation of products, services and data initiatives. Also important to remark that Data Engineer will follow the directives defined by Cloud Architects and Enterprise Architects. The output produced by Data Engineer will be used to generate insights that support strategic decision-making, along with ideas to automate, innovate, and enhance the measurement and reporting process.

Responsabilities

  • Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.
  • Create and schedule data flow tasks. Should be able to create automated data validation methods. Using the defined pipelines in Jenkins and follow the recommendations and standards defined in End to End team (E2E).
  • Deal with the current architecture defined in AWS (Postgres database, OpenShift, AWSservices…)
  • Assemble, gather and cleanse data sets that meet business requirements.
  • Queries and/or integrates data by using SQL, SQL-like, ELT or ETL skills tools and techniques.
  • Creates data models by delivering calculations, aggregations and derivations from input data.
  • Implements business rules that are required to meet business needs
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies.
  • Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Mine and analyze data from company databases or external data to drive optimization and improvement of product development.
  • Develop custom data models and algorithms to apply to data sets.
  • Find acceptable alternatives that satisfy the needs of multiple stakeholders.
  • Develop processes and tools to monitor and analyze model performance and data accuracy.

Position requirements

  • University degree in the field of Computer Science (strongly preferred), Statistics, Mathematics. A Big data /data analytics Master’s degree is strongly preferred.
  • Good understanding of data integration practices. Understanding how to schedule data flow tasks in AWS, usage of APIs, cloud connectors, etc.
  • Good understanding of JAVA and JavaScript would be an advantage.
  • Good knowledge in statistical computer languages (R, Python, SLQ, etc.) to manipulate data and draw insights from large data sets
  • Good knowledge of data analytics architectures and standards, including data warehouses, master data management, ETL, OLAP, data quality management, advanced analytics, BI, visualization and service execution rigor / discipline
  • Good understanding in collaborative development environment and the usage of GIT
  • Good understanding of deployment tools (Dockers, Kubernetes…) 
  • Good knowledge in development tools such as Anaconda Jupiter, AWS Sage maker, visual studio…
  • Good knowledge of data integration techniques and data management
  • General knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks.
  • General knowledge in statistical and data mining techniques: GLM/Regression, Random Forest, Boosting, Trees, etc.
  • General knowledge in visualizing/presenting data for stakeholders using Qlikview, Qlik sense, Business Objects, Angular, python (Jupiter) etc.