PythonでAliexpressをスクラップする - ログインが必要です

私はPythonでAliexpressサプライヤのすべての製品を通過するWebスクレーパーを厳守しようとしています。私の問題は、私はそれをログに記録せずに行くときに私は最終的にログインWebページにリダイレクトされるということです。自分のコードにログインセクションを追加しましたが、それは役に立ちません。私はすべての提案を感謝します。PythonでAliexpressをスクラップする - ログインが必要です

マイコード：

import requests 
from bs4 import BeautifulSoup 
import re 
import sys 
from lxml import html 


def go_through_paginator(link): 
    source_code = requests.get(link, data=payload, headers = dict(referer = link)) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text) 
    print(soup) 
    for page in soup.findAll ('div', {'class' : 'ui-pagination-navi util-left'}): 
     for next_page in page.findAll ('a', {'class' : 'ui-pagination-next'}): 
      next_page_link="https:" + next_page.get('href') 
      print (next_page_link) 
      gather_all_products (next_page_link) 

def gather_all_products (url): 
    source_code = requests.get(url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text) 
    for item in soup.findAll ('a', {'class' : 'pic-rind'}): 
     product_link=item.get('href') 
    go_through_paginator(url) 


payload = { 
    "loginId": "EMAIL", 
    "password": "LOGIN", 
} 

LOGIN_URL='https://login.aliexpress.com/buyer.htm?spm=2114.12010608.1000002.4.EihgQ5&return=https%3A%2F%2Fwww.aliexpress.com%2Fstore%2F1816376%3Fspm%3D2114.10010108.0.0.fs2frD&random=CAB39130D12E432D4F5D75ED04DC0A84' 

session_requests = requests.session() 
source_code = session_requests.get(LOGIN_URL) 
source_code = session_requests.post(LOGIN_URL, data = payload) 


URL='https://www.aliexpress.com/store/1816376?spm=2114.10010108.0.0.fs2frD' 

source_code = requests.get(URL, data=payload, headers = dict(referer = URL)) 
plain_text = source_code.text 
soup = BeautifulSoup(plain_text) 

for L1 in soup.findAll ('li', {'id' : 'product-nav'}): 
    for L1_link in L1.findAll('a', {'class' : 'nav-link'}): 
     link = "https:" + L1_link.get('href') 
     gather_all_products(link)

そして、このAliexpressのログインURL： https://login.aliexpress.com/buyer.htm?spm=2114.12010608.1000002.4.EihgQ5&return=https%3A%2F%2Fwww.aliexpress.com%2Fstore%2F1816376%3Fspm%3D2114.10010108.0.0.fs2frD&random=CAB39130D12E432D4F5D75ED04DC0A84

出典

2017-01-13 Grzesiu Kropka PL

戻ってくるクッキーで何かしていますか？彼らはおそらくそれを証明しているからです。あなたはおそらくヘッダーにある必要がありますが、ヘッダーはURLのようですか？ – 1N5818

私はおそらくヘッダーのログインとログアウトをこのようなもので区別したいと思っています。 https://stackoverflow.com/questions/4423061/view-http-headers-in-google-chrome – 1N5818

応答のクッキーから intl_common_forever xman_tとからクッキー値を設定するようにしてください。

私はすべての製品情報を直接取得しようとしていました。 xman_tとintl_common_foreverを設定する前に、Aliexpressで7つの製品を取得できるようにしてください。私はxman_tとintl_common_foreverを設定した後、私は正常に50個の製品を取得します。

うまくいけば、これはあなたの製品を擦るのに役立ちます。

出典

2017-06-13 20:28:25

PythonでAliexpressをスクラップする - ログインが必要です

答えて

関連する問題