Sergey Antopolskiy, Data Scientist and Developer in Reggio Emilia, Province of Reggio Emilia, Italy
Sergey Antopolskiy

Data Scientist and Developer in Reggio Emilia, Province of Reggio Emilia, Italy

Member since November 16, 2020
Sergey is an expert data scientist and machine learning engineer. He has solved data analytics, visualization, and modeling problems and developed architecture and pipelines for data-centered workflows. He automated manual production steps in drug manufacturing using ML models, which led to a throughput increase of 200%. Sergey has a scientific background with extensive industry experience and understands complex problems in-depth to create the most suitable approach to generate business value.
Sergey is now available for hire

Portfolio

  • myInvenio
    Data Reporting, Data Science, Pandas, Java 8, You Only Look Once (YOLO)...
  • CAMLIN Group
    Gantt Charts, Project Management, Data Preprocessing, Data Pipelines...

Experience

Location

Reggio Emilia, Province of Reggio Emilia, Italy

Availability

Part-time

Preferred Environment

Conda, Azure, Docker, Bash Scripting, Jupyter, PyCharm, MacOS, Linux, Python

The most amazing...

...project I've developed is a distributed cloud platform for on-demand training of ML models to find the root causes of abnormalities in business processes.

Employment

  • Senior Data Scientist

    2019 - PRESENT
    myInvenio
    • Designed and implemented a cloud-based platform for on-demand training and deployment of ML models for process mining and business process analysis; this enabled several novel ML algorithms and massively shortened TTM for further ML-based projects.
    • Co-designed and implemented a novel AI/XAI algorithm for extracting root causes of business process anomalies; this bleeding-edge algorithm led to advertising our product as AI-enabled, multiple sales to large customers, and partnering with IBM.
    • Automated manual production steps in drug manufacturing using ML models, which led to throughput increase by 200% while maintaining labor cost; proposed additional cost-effective improvements estimated to reduce manual involvement by 200%.
    • Designed and prototyped an NLP-based ML pipeline, which allowed unsupervised identification of business threads from screen capture and click-and-key log of PC user activity; implemented a POC of PC application to obtain the necessary data.
    • Implemented a data quality control pipeline for business process logs and designed a wizard-based UX to guide users in fixing issues with their data; it reduced the load on the helpdesk, as many requests were related to data issues unknown to clients.
    • Implemented a simple yet powerful, engine for Business Rules Mining by extending Java library for decision trees; designed UI/UX for presenting its results to the users; the sales department often cited the new feature as a major selling point.
    • Created numerous automated CD/CI pipelines, development, and testing tools for internal use of the data science team, which streamlined workflows and reduced manual labor related to testing by about three times, as estimated by the co-workers.
    Technologies: Data Reporting, Data Science, Pandas, Java 8, You Only Look Once (YOLO), Conda, DevOps, Azure DevOps, Business Rules, Decision Trees, Logistic Regression, Tokenization, Topic Modeling, WinAPI, Tesseract, OCR, Machine Learning Operations (MLOps), Git, Azure Tables, Azure Table Storage, SQL, Process Discovery, Process Mining, Business Process Analysis, Classification Algorithms, Azure Blob Storage API, Bash Scripting, Azure Kubernetes Service (AKS), Azure Functions, Explainable Artificial Intelligence (XAI), LSTM Networks, Gradient Boosting, Gradient Boosted Trees, Microsoft Azure Machine Learning (ML), Azure Blobs, Azure, Python, Computer Vision, Kubernetes, Natural Language Processing (NLP), Random Forests, Artificial Intelligence (AI), RESTful Development, RESTful APIs, MySQL, PostgreSQL, Root Cause Analysis, Anomaly Detection, MLflow, Apache Airflow
  • Senior Research Scientist/Senior Data Scientist

    2017 - 2019
    CAMLIN Group
    • Discovered and fixed results-invalidating the bug in the previously used data analysis pipeline; without my involvement, wrong results would have been published in a major publication and invalidated the filed patent.
    • Designed and implemented a data preprocessing pipeline for the multidimensional biometric data and deep-learning model for real-time prediction of user intentions from biometric data.
    • Ported a Tensorflow-based ML model on edge device Jetson TX2, which allowed model training, deployment, and real-time prediction in a portable battery-powered form.
    • Co-led a multistage project, planning and coordinating work between several parties, including a scientistic research lab, industrial R&D, and engineering team, and communicated with the stakeholders.
    • Participated in patenting the discoveries, including an ML model, as an end-to-end approach for Brain-Computer Interface architecture (https://patents.google.com/patent/WO2020211958A1).
    • Created and taught an extensive 3-week course on Applied Data Science for interns and junior employees and conducted internal training.
    Technologies: Gantt Charts, Project Management, Data Preprocessing, Data Pipelines, Principal Component Analysis (PCA), Unsupervised Learning, Classification Algorithms, Logistic Regression, Convolutional Neural Networks, Deep Learning, TensorFlow, Data Quality Analysis, Complex Data Analysis, Time Series, Time Series Analysis, Accelerometers, Experimental Design, Biometrics, HDF, MATLAB, Python 3, Keras, Artificial Intelligence (AI), Anomaly Detection

Experience

  • Cloud Platform for Data-agnostic On-demand ML Training, Deployment, and Serving

    The need was to create a data-agnostic cloud-based platform for ML experiments and production lifecycles, decoupled from the main business analytics software.

    I designed and implemented an Azure-based distributed platform, which included (1) Serverless Azure functions as the platform API and workflow orchestrator, (2) Azure Blob Storage as the datalake, (3) MSSQL DB (later moved to Azure Tables for convenience) for storing states and intermediate results, (4) Azure ML Compute Clusters for running ML algorithms and producing artifacts, (5) Azure Kubernetes Cluster for deploying the models and serving the predictions, (6) Git repository of algorithms which can be run on-demand, (7) CI/CD pipelines.

    When a user creates a project and uploads the dataset to the main software, the platform accesses the data and runs a series of ML experiments, producing models, predictions and explanations. The predictions and explanations are submitted through the REST API back to the main software, where they are displayed to the user in various scenarios, providing them with detailed insight into their dataset and allowing better decision making. Some of the models are automatically deployed as endpoints to provide real-time predictions.

  • Increased Throughput of a Pharmacological Production Line Via Improved Process Model

    Several stages of the drug manufacturing at the client's production plant had a lead time of 1.5 hours per batch and needed constant manual intervention, which prevented the desired scaling of the production. The client and I identified the root cause as a lack of a precise model of the amount of chemicals needed to add to each batch to achieve desired product properties.

    Using historic process data obtained from the client, I created a precise model, which allowed me to combine several production steps without the direct involvement of the personnel. I achieved this by extracting necessary variables from the time-series signals of the production line sensors and combining them in polynomial regression. This reduced the lead time of the bottleneck manufacturing step to slightly less than 30 minutes, leading to a corresponding increase in the throughput (+200% as estimated by the client) while also reducing the load on the technical staff. I packaged and shipped the model, meeting specific client technical requirements.

    While working on the project, I proposed several cost-effective improvements to the production line, which are estimated to reduce manual involvement by 200% while increasing the production's precision.

  • Data-agnostic Business Rules Mining (BRM) Algorithm Using Extended Decision Trees

    The goal:
    - Extract business rules describing the conditions under which a process goes from activity A to one of the possible next activities (B, C, etc.)
    - Estimate the consistency of these rules.
    - Present this in a user-friendly form.
    - Take <5 seconds on 1 million business cases.
    - Integrate easily with the main Java software.

    I decided to use Java 8 for that project. I extended the publicly available basic Decision Tree library with many necessary functions, such as pruning, metric estimation, tracking groups, working with missing data, and more. With that and feature engineering/augmentation pipelines, the algorithm obtains classification models for each transition and translates them to text rules, such as "A to B: when X > 10, or X < 1 and Y > 100". These are easily interpreted by business users. Metrics are presented in a user-friendly way, allowing to judge the consistency of the identified rules. I designed UX/UI for displaying and exploring the insights and adjusting them to the specific users' datasets (e.g., users can make rules more complex and precise, if they want).

    BRM became one of the core features of the software anda key sales point. It became a basis for process simulations, another core feature.

  • Brain-computer Interface for Neural Menu Navigator
    https://arxiv.org/pdf/2004.11978.pdf

    We created and tested a brain-computer interface prototype based on the real-time analysis of multidimensional bioelectrical signals obtained from the scalp of a car driver (EEG), showing the selected items in the infotainment menu in a completely hands-free way.

    I designed and implemented an EEG data preprocessing pipeline and ML model (based on the convolutional neural network architecture), which was trained on the driver's data and in real-time predicted which infotainment function they wanted to select (navigation, music, etc.). The ML model was ported to the battery-powered portable edge device NVIDIA Jetson TX2, allowing it to work independently inside a car. To increase the project business value, we also collected rich motion data using a set of accelerometer sensors to create future models predicting steering actions.

    I co-led this project; in particular, I coordinated the activity between neuroscientists, engineers, and our research partners from Toyota Motor Europe, designed the prototype tests and data collection.

    This work resulted in several papers (for a detailed account, see the project URL) and a patent I co-authored (https://patents.google.com/patent/WO2020211958A1).

  • Time-frequency Signatures in Brain Activity Related to Car Control During Driving
    https://www.sciencedirect.com/science/article/abs/pii/S000689931830461X

    I analyzed the electroencephalographic (EEG) dataset to extract patterns related to the driving actions; braking, acceleration, and steering. The data consisted of an EEG, accelerometer data, and driving simulator data, all of which were multidimensional time series.

    I was invited to the project at a late stage; however, while analyzing the previous work, I found a serious bug in the data analysis, which invalidated the results about to be published. As a consequence, I was asked to join the project full-time to improve the analysis, which was eventually published as a scientific paper and partially patented (https://patents.google.com/patent/WO2019025000A1).

    In particular, my work consisted of synchronization of the data streams from different devices, extracting and filtering event triggers, performing PCA, and factorizing EEG signals on independent components (ICA), with subsequent time-frequency statistical analysis.

Skills

  • Languages

    Python 3, Python, SQL, Java 8
  • Libraries/APIs

    Azure Blob Storage API, Pandas, REST APIs, TensorFlow, Accelerometers, cuDDN, WinAPI, Keras
  • Tools

    Jupyter, Git, PyCharm, MATLAB, Azure Kubernetes Service (AKS), You Only Look Once (YOLO), LabVIEW, Apache Airflow
  • Paradigms

    Data Science, Azure DevOps, DevOps, Test-driven Development (TDD), RESTful Development, UX Design, UI Design, Anomaly Detection
  • Platforms

    Azure Functions, Docker, Azure, Linux, MacOS, Kubernetes
  • Storage

    Azure Blobs, Azure Table Storage, Azure Tables, Data Pipelines, MySQL, PostgreSQL
  • Other

    Data Visualization, Data Analytics, Machine Learning, Microsoft Azure Machine Learning (ML), Biometrics, Principal Component Analysis (PCA), Feature Engineering, Biomedical Skills, Experimental Design, Complex Data Analysis, Data Quality Analysis, Logistic Regression, Classification Algorithms, Data Preprocessing, Gradient Boosted Trees, Gradient Boosting, Process Mining, Machine Learning Operations (MLOps), Decision Trees, Neuroscience, Data Preparation, Health IT, Experimental Research, Scientific Data Analysis, Polynomial Regression, Linear Regression, Artificial Intelligence (AI), Bash Scripting, Conda, Unsupervised Learning, Clustering, Convolutional Neural Networks, Time Series Analysis, Non-negative Matrix Factorization (NMF), HDF, Time Series, Deep Learning, Gantt Charts, Explainable Artificial Intelligence (XAI), Business Process Analysis, Process Discovery, OCR, Tesseract, Business Rules, APIs, EEG, EEG Libraries for Python, Computational Biology, Computational Statistics, Statistics, Statistical Modeling, Simulations, Synthetic data generation, Data Reporting, Sensor Data, Client Reporting, Random Forests, RESTful APIs, Root Cause Analysis, Digital Signal Processing, LSTM Networks, Topic Modeling, Tokenization, Computer Vision, Natural Language Processing (NLP), MLflow
  • Industry Expertise

    Project Management

Education

  • Ph.D. in Systems Neuroscience
    2011 - 2016
    International School for Advanced Studies - Trieste, Italy
  • Coursework (exchange student) in Computational Neuroscience
    2014 - 2014
    Frankfurt Institute for Advanced Studies - Frankfurt, Germany
  • Master's Degree in Physiology
    2006 - 2011
    Lomonosov Moscow State University - Moscow, Russia

To view more profiles

Join Toptal
Share it with others