Data Scientist | 6 to 9 years | Bengaluru
As a Python + Data Engineer + ML engineer you would be expected to have:
- Data Scraping: Develop robust data scraping solutions using Python to extract data from diverse sources, including websites, APIs, databases, and files in various formats (e.g., CSV, JSON, XML, PDF, HTML).
- Data Transformation: Cleanse, preprocess, and transform raw data into structured formats suitable for analysis and integration into data pipelines and systems.
- File Type Handling: Demonstrate expertise in handling different types of files and data structures, including structured databases and unstructured text or media files.
- at least 5 years of hands on experience writing effective, scalable code, develop back-end components using Python programming language
- experience in extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL and Azure technologies;
- technical expertise with data models, data mining, and segmentation techniques
- great numerical and analytical skills, interpret trends and patterns
- build algorithms and prototypes, prepare data for prescriptive and predictive modelling
- familiarity with tools like Databricks, Spark, PySpark, Airflow, Spark-SQL, Hadoop etc.
- Design and develop machine learning models to solve business problems.
- Implement and optimize similarity-based models such as KNN and Cosine Similarity.
- Develop and deploy clustering models (e.g., K-Means, DBSCAN) and classification models (e.g., Logistic Regression, Random Forest).
- Preprocess and clean data to ensure high-quality inputs for models.
- Perform feature engineering to enhance model performance.
- Evaluate model performance using appropriate metrics and refine models as needed.
- Develop and maintain APIs for model serving and integration with other services.
- Implement CI/CD pipelines to automate the deployment and monitoring of machine learning models.
- Collaborate with cross-functional teams to understand business requirements and deliver data-driven solutions.
- Stay up-to-date with the latest advancements in machine learning and incorporate best practices.
- Document model development processes and maintain code repositories.
Bangalore, IN Noida, IN