If you live in Singapore or Philippines, you must be familiar with Shopee, and you can find almost anything you want on it. Sometimes you want to wait for price cut, and you want to check price everyday through programming. I’m going to show you how to get the latest price of each single product and save your day.
The Example below are Shopee’s product links in the basic format.
# basic format
https://shopee.{region}/--i.{shopid}.{itemid}# Philippines
https://shopee.ph/--i.24681653.8727874679# Singapore
https://shopee.sg/--i.52377417.6108822709
With these links, you can browse the product page like this:
In this page, you can see there is a product name (Sony WF…) together with the latest price ($191), but this information is not embedded in the HTML code of the product page in the very beginning, and that makes things a little bit harder.
The product information actually comes from another API query:
# API link template for product information
https://shopee.{region}/api/v2/item/get?itemid={}&shopid={}
To retrieve the product information, the only thing you need to do is filling the shop-id and item-id from product link into the API link template given above:
# Philippines
https://shopee.ph/api/v2/item/get?itemid=8727874679&shopid=24681653# Singapore
https://shopee.sg/api/v2/item/get?shopid=52377417&itemid=6108822709
If you copy these link into your browser, you will see that there are not only name and price of this product, but much more detailed information.
Here Comes the Code
Now I’m going to share the complete code to get a product’s name and price from Shopee:
import requests
import jsonclass Shopee():
def __init__(self, region = 'tw', proxies = None):
assert region in ('tw','ph','sg')
self.user_agent = 'Mozilla/5.0 ' \
+ '(Windows NT 10.0; Win64; x64)' \
+ 'AppleWebKit/537.36 ' \
+ '(KHTML, like Gecko) ' \
+ 'Chrome/89.0.4389.90 Safari/537.36'
self.headers = {'user-agent':self.user_agent}
self.proxies = proxies
self.region = region
self.url_template = 'https://shopee.' + region \
+ '/api/v2/item/get?itemid={}&shopid={}'
def _url(self, i_code):
shopid, itemid = i_code.split('.')
url = self.url_template.format(str(itemid), str(shopid))
return url
def _headers(self, params = dict()):
assert isinstance(params, dict)
headers = self.headers.copy()
headers.update(params)
return headers
def _proxies(self):
proxies = self.proxies
return proxies
def _request(self, i_code):
url = self._url(i_code)
headers = self._headers(
{'referer': f'https://shopee.tw/--i.{i_code}'}
)
proxies = self._proxies()
response = requests.get(
url,
headers = headers,
proxies = proxies,
timeout = 10
)
return response
# you can rewrite this function to retrieve more info.
def _parse(self, response):
name = None
price = None
dct = response.json()
name = dct['item']['name']
price = dct['item']['models'][0]['price']
price = int(price/100000)
return name, price
def query(self, i_code):
try:
response = self._request(i_code)
name, price = self._parse(response)
return name, price
except:
return None, None
Let’s try it out. The 1st step is to find the item-id and shop-id in the product page url, leaded with -i:
The second step for Sopee in Philippines:
The second step for Shopee in Singapore:
Congratulations! For tracking a single product’s price, this is more than enough!
“What if I need to crawl every product’s price from Shopee? Is there anything I need to know? ”
Yes, 1st is that you need to know every product’s item-id and shop-id, and that is not the topic in this article.
2nd, you need to “camo yourself”, otherwise you will be busted and will not be allowed to visit Shopee anymore. To avoid this, you need a proxy service as your gear, to hide your original IP behind proxy servers.
I’m not going to show you free proxy services, since there is no free lunch, free crap is not stable and will mess you up. Instead, I’m going to introduce you the proxy service that I trust the most, Bright Data.
Hide Your IP with Proxy Service
Bright Data, formerly known as Luminati, is one of the best proxy service providers in the world.
To use Bright Data, you need to register first, and you need to pay for $1 with your credit card to create a valid account, as Google did. After that, you will get $5 usage and you are guaranteed to use Bright Data’s proxy services.
Let’s start with the most basic proxy service, the data center proxy service. There are many data centers all over the world, and Bright Data provide proxy service through these data centers, with this service, Shopee will see that the IP is coming from one of these data centers, not from your home.
How to Embed Proxy Service into Your Python Code?
You need to get the IP of the proxy server, as well as your username and password, since Bright Data provides private proxy services.
Click the “edit” button of the “Data Center” page, and you can find there is a place to download the proxy IP list that you can use. The downloaded IP list should be in this format:
<host>:<port>:<username>:<password># host: proxy host
# port: proxy port
# username: your user name bundled with IP
# password: your password
Reformat this into proxy url:
proxies = {
'https' : f'https://{username}:{password}@{host}:{port}',
'http' : f'http://{username}:{password}@{host}:{port}',
}
Now you can use proxy service in your code:
And you will get the same result as above:
The only difference is, now Shopee doesn’t know your real IP, it can only see the IP from data center. To verify this, you can visit httpbin with proxy service, and this website will show the IP you are coming from. This is the test I did:
As a result, httpbin shows that I’m coming from the proxy IP, good job!
There is a lot more for you to Discover!
Data center proxy service is only the basic service that Bright Data provides. There are so many different kinds of services waiting for you to discover. For IT guys, Bright Data also provides docker images for you to fast deploy a proxy service inside your company!