Web Scraping with Selenium and Python
Imagine what would you do if you could automate all the repetitive and boring activities you perform using internet, like checking every day the first results of Google for a given keyword, or download a bunch of files from different websites.
In this code you’ll learn to use Selenium with Python, a Web Scraping tool that simulates a user surfing the Internet. For example, you can cial accounts, simulate a user to test your web application, and anything you find in your daily live that it’s repetitive. The possibilities are infinite! :-)
Here my example code for scrap the data from the sports website. grab all the data and filter the data according to category's like football,cricket,basketball etc , this code will help you to detail understand about the working selenium with python ,and how to scrap the data using the technology
Requirements:
Step 1 : Create Virtual ENV
You need to install virtual environments in your local machine if virtualenv is installed in your system create a virtualenv using this command : virtualenv scrapy. if you dont installed the virtual env install virtualenv in your root in your machine : sudo pip install virtualenv. activate the env using source scrapy/bin/activate.
Step 2 : Install dependencies in your env.
- BeautifulSoup==3.2.1
- EasyProcess==0.1.9
- PyVirtualDisplay==0.1.5
- argparse==1.2.1
- beautifulsoup4==4.4.1
- selenium==2.47.3
- wsgiref==0.1.2
Step 3 : download the code from the git hub and run it. you can see the script downloading the match details accodring to category wise and make it in txt file.
This code you can run it two way with argument and with arguments.
if you run the code as python filename.py : you can see the details according to today and tomorrow. and if you run the code like python filename.py 2015/05/05 , you will get the match details according to the this date ( 2015/05/05 ).
please make sure the pip installed in your machine.
My Script for scrapping is : scrapping file