Browse
···
Log in / Register

Java Developer with Web Crawler Experience

Negotiable Salary

Axiom Software Solutions Limited

Austin, TX, USA

Favourites
Share

Description

Role: Java Developer with Web Crawler Experience Location: Austin TX(Hybrid) Responsibilities: 1. Web Crawler Development: Design and implement efficient and scalable web crawlers in Java to collect data from various online sources. 2. Data Extraction: Develop and maintain systems for structured data extraction, handling various data formats (HTML, JSON, XML, etc.). 3. Data Storage and Processing: Design data storage and processing pipelines, ensuring extracted data is clean, structured, and easily accessible. 4. Performance Optimization: Optimize web crawling processes for speed, efficiency, and accuracy, while ensuring minimal impact on source websites. 5. Error Handling and Logging: Implement error-handling mechanisms and logging systems to detect and resolve issues during crawling operations. 6. Data Integrity and Compliance: Ensure data collection practices are ethical, legal, and compliant with relevant regulations (e.g., robots.txt, copyright laws). Requirements: Proficiency in Java and experience with Java-based web scraping libraries (e.g., Jsoup, Apache HttpClient). Knowledge of web crawling frameworks and tools, such as Scrapy, Selenium, or Puppeteer. Strong understanding of HTML, CSS, JavaScript, and web data structures. Familiarity with data parsing and handling techniques for JSON, XML, and other common formats. Experience with database technologies (SQL, NoSQL) to store and manage scraped data. Knowledge of HTTP protocols, headers, proxies, and load handling.

Source:  workable View original post

Location
Austin, TX, USA
Show map

workable

You may also like

Workable
Sr. Data Scientist
We are seeking a talented and experienced Sr. Data Scientist to join our team. As a Data Scientist, you will play a crucial role in designing, developing, and implementing data science solutions to empower investigators and enhance our capabilities in extracting meaningful information and identifying anomalous patterns of activity. Responsibilities: Collaborate with subject matter experts, team leads, and third-party vendors to define new features and functions for automation, aiding investigators in extracting meaningful information about a target and their surrounding network. Design, code, test, and document data science microservices primarily in Python. Support the integration of disparate bulk data sources into a unified database. Develop and optimize graph traversal queries and analytic pipelines to support analyst use cases, ensuring smooth transition from development to test and production environments. Extract valuable information from unstructured text, including SAR narratives and web scraped data related to cryptocurrency addresses and actors. Generate synthetic data for testing and development environments, as well as for the MM capstone training, adapting to evolving MM training and data holdings. Requirements Proficient in Python programming. Experience with graph traversal languages such as Gremlin, Cypher, or GraphML, along with expertise in network analytics, including centrality, community detection, link prediction, pattern recognition, and blockchain analytics (Preferred). Strong SQL or other relational database query experience. In-depth knowledge of graph-structured data and analytics. Familiarity with Natural Language Processing (NLP) techniques. Background in financial and banking data analytics, including insights from data and unstructured data-information extraction. Expertise in blockchain architecture and cryptocurrency data analytics. Knowledge of GPS technology and its integration into analytical processes. Experience with cloud platforms, specifically Amazon Web Services (AWS). Understanding of machine learning algorithms and their application in cybersecurity analytics. Qualifications: Bachelor's or advanced degree in Computer Science, Data Science, or a related field. 5 plus years’ experience in a similar role Strong communication and collaboration skills. Ability to work in a dynamic and fast-paced environment. An active Top Secret clearance is required. #IND123  
Tampa, FL, USA
Negotiable Salary
Workable
Senior Big Data Engineer
ABOUT US: Headquartered in the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world’s top provider of Wi-Fi devices. The company is committed to delivering innovative products that enhance people’s lives through faster, more reliable connectivity. With a commitment to excellence, TP-Link serves customers in over 170 countries and continues to grow its global footprint. We believe technology changes the world for the better! At TP-Link Systems Inc, we are committed to crafting dependable, high-performance products to connect users worldwide with the wonders of technology.  Embracing professionalism, innovation, excellence, and simplicity, we aim to assist our clients in achieving remarkable global performance and enable consumers to enjoy a seamless, effortless lifestyle.  KEY RESPONSIBILITIES Develop and maintain the Big Data Platform by performing data cleansing, data warehouse modeling, and report development on large datasets. Collaborate with cross-functional teams to provide actionable insights for decision-making. Manage the operation and administration of the Big Data Platform, including system deployment, task scheduling, proactive monitoring, and alerting to ensure stability and security. Handle data collection and integration tasks, including ETL development, data de-identification, and managing data security. Provide support for other departments by processing data, writing queries, developing solutions, performing statistical analysis, and generating reports. Troubleshoot and resolve critical issues, conduct fault diagnosis, and optimize system performance. Requirements REQUIRED QUALIFICATIONS Bachelor’s degree or higher in Computer Science or a related field, with at least three years of experience maintaining a Big Data platform. Strong understanding of Big Data technologies such as Hadoop, Flink, Spark, Hive, HBase, and Airflow, with proven expertise in Big Data development and performance optimization. Familiarity with Big Data OLAP tools like Kylin, Impala, and ClickHouse, as well as experience in data warehouse design, data modeling, and report generation. Proficiency in Linux development environments and Python programming. Excellent communication, collaboration, and teamwork skills, with a proactive attitude and a strong sense of responsibility. PREFERRED QUALIFICAITONS Experience with cloud-based deployments, particularly AWS EMR, with familiarity in other cloud platforms being a plus. Proficiency in additional languages such as Java or Scala is a plus. Benefits Salary Range: $150,000 - $180,000 Free snacks and drinks, and provided lunch on Fridays Fully paid medical, dental, and vision insurance (partial coverage for dependents) Contributions to 401k funds Bi-annual reviews, and annual pay increases Health and wellness benefits, including free gym membership Quarterly team-building events At TP-Link Systems Inc., we are continually searching for ambitious individuals who are passionate about their work. We believe that diversity fuels innovation, collaboration, and drives our entrepreneurial spirit. As a global company, we highly value diverse perspectives and are committed to cultivating an environment where all voices are heard, respected, and valued. We are dedicated to providing equal employment opportunities to all employees and applicants, and we prohibit discrimination and harassment of any kind based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. Beyond compliance, we strive to create a supportive and growth-oriented workplace for everyone. If you share our passion and connection to this mission, we welcome you to apply and join us in building a vibrant and inclusive team at TP-Link Systems Inc. Please, no third-party agency inquiries, and we are unable to offer visa sponsorships at this time.
Irvine, CA, USA
$150,000-180,000/year
Workable
Data & BI Senior Data Engineer
Job Description: We are seeking a highly skilled and experienced Senior Data Engineer to join our team. The ideal candidate will have a strong background in data engineering, with a specialization in Matillion, SSIS, Azure DevOps, and ETL processes. This role will involve designing, developing, testing, and deploying ETL jobs, collaborating with cross-functional teams, and ensuring efficient data processing. Key Responsibilities: Design, develop, test, and deploy Matillion ETL jobs in accordance with project requirements. Collaborate with the Data and BI team to understand data integration needs and translate them into Matillion ETL solutions. Create and modify Python code/components in Matillion jobs. Identify opportunities for performance optimization and implement enhancements to ensure efficient data processing. Collaborate with cross-functional teams, including database administrators, data engineers, and business analysts, to ensure seamless integration of ETL processes. Create and maintain comprehensive documentation for Matillion ETL jobs, ensuring knowledge transfer within the team. Create, test, and deploy SQL Server Integration Service (SSIS) packages and schedule them via Active Batch scheduling tool. Create Matillion deployment builds using Azure DevOps CI/CD pipeline and perform release manager activities. Review code of other developers (L2, L3-BI/DI) to ensure code standards and provide approval as part of code review activities. Resolve escalation tickets from the L2 team as part of the on-call schedule. Working knowledge of API and Postman tool is an added advantage. Qualifications: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. 5+ years of experience in data engineering, with a focus on ETL processes. Proficiency in Matillion, SSIS, Azure DevOps, and ETL. Strong knowledge of SQL, Python, and data integration techniques. Experience with performance optimization and data processing enhancements. Excellent collaboration and communication skills. Ability to work in a fast-paced, dynamic environment. Preferred Skills: Experience with cloud platforms such as AWS or Azure. Knowledge of data warehousing and data modeling. Familiarity with DevOps practices and CI/CD pipelines. Strong problem-solving skills and attention to detail.
Atlanta, GA, USA
Negotiable Salary
Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.