Data Mining Engineer

About us:

As a leading telecommunications infrastructure provider in the region, CETIN drives digital transformation with a strong presence across Bulgaria, Hungary, Serbia, and Slovakia. Through our subsidiaries, we deliver advanced technology solutions that empower connectivity and enable innovation.
CETIN owns and operates some of the largest telecommunications networks in Bulgaria and Serbia, committed to delivering high-quality, cutting-edge infrastructure that meets and exceeds our customers’ technological needs. Our focus is on developing and deploying world-class telecommunications infrastructure to tackle the dynamic demands of today’s market.
At CETIN, we bring together highly skilled and motivated experts dedicated to excellence.

Join us to be part of a team shaping the future of telecommunications!

Role Overview:

As a Data Mining engineer, you will be responsible for designing, implementing, and maintaining scalable data solutions that support real-time and batch processing workflows across diverse network and IT infrastructure domains. This includes integrating heterogeneous data sources (Radio, Core, IP, Optical, IT), building resilient data pipelines, ensuring data quality, and enabling advanced analytics and reporting capabilities.

You will work in a hybrid environment utilizing modern data engineering tools and cloud-native technologies. The role requires hands-on experience with distributed processing, containerization, orchestration, and event-driven systems.

Key Responsibilities:

Design, develop, and optimize robust ETL/ELT pipelines for structured and unstructured data
Maintain and extend the data ingestion framework handling large-scale network telemetry, log data, and KPIs
Integrate data from multiple domains including RAN, transport, OSS/BSS, and IT systems
Build and manage DAGs in Apache Airflow for workflow orchestration and automation
Build advanced technical reports via visualization platforms like MS Power BI or Apache Superset or similar
Develop Spark-based jobs for distributed data processing and transformation
Deploy and manage services within containerized environments using Docker and Kubernetes
Implement and manage data streaming or messaging patterns via RabbitMQ or similar brokers
Apply best practices for data integrity, quality validation, monitoring, and lineage
Collaborate with Data Scientists, DevOps, and domain experts to expose data for analytics and ML workloads
Maintain technical documentation, CI/CD workflows, and adhere to code versioning standards (e.g., Git)

Required Technical Skills:

University degree and/ or proficiency in Python (Pandas, PySpark, etc.) for data transformation and scripting
Advanced SQL (analytical queries, window functions, performance tuning)
Experience with distributed systems such as Apache Spark, Kafka or Flink
Strong command of Linux/Unix environments, bash scripting, and system monitoring
Experience with Docker containers and orchestration using Kubernetes
Workflow automation using Apache Airflow (DAG creation, scheduling, dependencies) or Talend Studio
Working knowledge of RabbitMQ or similar message queues for data decoupling
Familiarity with CI/CD pipelines, Git-based workflows, and infrastructure as code
Exposure to monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack) is a plus
Understanding of network protocols and telecom systems is a strong advantage

Soft Skills & Expectations:

Structured, self-managed, and capable of working independently
Strong team collaboration in Agile/Scrum environments
Fluent in English (written and spoken)
Continuous learner with a proactive approach to new technologies and patterns

What we offer:
• Open working space and flexible working hours

• Training and development programs
• 25 days paid holiday leave per annum
• Laptop, mobile phone and tariff package in line with company policy
• Private health insurance
• Recreation or health program
• Participation in annual bonus system
• Competitive terms and conditions

Only short-listed candidates will be contacted