2017-12-24 10 views
0

私はいくつかのWebサイトのwhoisレコードを収集するためにpython whoisライブラリを使用しようとしています。python whois libraryはアクティブなドメイン名のために何も返しません

問題は、アクティブなドメイン名であるnih.govなど、一部のWebサイトでは何も得られていないことです。

w = whois.whois("nih.gov") 
print w 
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None} 

何が問題なのか、どのライブラリを使用するのか、どのようにすべての状況をカバーするのか理解できません。

+1

[whois nih.gov](https://www.whois.com/whois/nih.gov)と[whois stackoverflow.com](https://www.whois.com/whois/)を比較してください。 )これは、「whois」が提供するすべての情報です。 – unutbu

答えて

1

ここでは仕事をするいくつかのコードがあります。例えば

import sys 
import socket 
from datetime import datetime as dt 
import time 

def whois(ip): 

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
    s.connect(("whois.arin.net", 43)) 
    s.send(('n ' + ip + '\r\n').encode()) 

    response = b"" 

    # setting time limit in secondsmd 
    startTime = time.mktime(dt.now().timetuple()) 
    timeLimit = 3 
    while True: 
     elapsedTime = time.mktime(dt.now().timetuple()) - startTime 
     data = s.recv(4096) 
     response += data 
     if (not data) or (elapsedTime >= timeLimit): 
      break 
    s.close() 

    print(response.decode()) 

def main(): 
    domain = sys.argv[1]; 
    ip = socket.gethostbyname(domain); 
    whois(ip) 

main() 

c:\Temp>py test.py www.google.com 

# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 


# 
# The following results may also be obtained via: 
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2 
# 

NetRange:  216.58.192.0 - 216.58.223.255 
CIDR:   216.58.192.0/19 
NetName:  GOOGLE 
NetHandle:  NET-216-58-192-0-1 
Parent:   NET216 (NET-216-0-0-0-0) 
NetType:  Direct Allocation 
OriginAS:  AS15169 
Organization: Google LLC (GOGL) 
RegDate:  2012-01-27 
Updated:  2012-01-27 
Ref:   https://whois.arin.net/rest/net/NET-216-58-192-0-1 


OrgName:  Google LLC 
OrgId:   GOGL 
Address:  1600 Amphitheatre Parkway 
City:   Mountain View 
StateProv:  CA 
PostalCode:  94043 
Country:  US 
RegDate:  2000-03-30 
Updated:  2017-12-21 
Ref:   https://whois.arin.net/rest/org/GOGL 


OrgAbuseHandle: ABUSE5250-ARIN 
OrgAbuseName: Abuse 
OrgAbusePhone: +1-650-253-0000 
OrgAbuseEmail: [email protected] 
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN 

OrgTechHandle: ZG39-ARIN 
OrgTechName: Google LLC 
OrgTechPhone: +1-650-253-0000 
OrgTechEmail: [email protected] 
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN 


# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 

、具体的にwww.nih.govのために我々が得る:

c:\Temp>py test.py www.nih.gov 

# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 


# 
# The following results may also be obtained via: 
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2 
# 

NetRange:  23.20.0.0 - 23.23.255.255 
CIDR:   23.20.0.0/14 
NetName:  AMAZON-EC2-USEAST-10 
NetHandle:  NET-23-20-0-0-1 
Parent:   NET23 (NET-23-0-0-0-0) 
NetType:  Direct Allocation 
OriginAS:  AS16509 
Organization: Amazon.com, Inc. (AMAZO-4) 
RegDate:  2011-09-19 
Updated:  2014-09-03 
Comment:  The activity you have detected originates from a dynamic hosting environment. 
Comment:  For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse 
Comment:  For more information regarding EC2 see: 
Comment:  http://ec2.amazonaws.com/ 
Comment:  All reports MUST include: 
Comment:  * src IP 
Comment:  * dest IP (your IP) 
Comment:  * dest port 
Comment:  * Accurate date/timestamp and timezone of activity 
Comment:  * Intensity/frequency (short log extracts) 
Comment:  * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time. 
Ref:   https://whois.arin.net/rest/net/NET-23-20-0-0-1 


OrgName:  Amazon.com, Inc. 
OrgId:   AMAZO-4 
Address:  Amazon Web Services, Inc. 
Address:  P.O. Box 81226 
City:   Seattle 
StateProv:  WA 
PostalCode:  98108-1226 
Country:  US 
RegDate:  2005-09-29 
Updated:  2017-01-28 
Comment:  For details of this service please see 
Comment:  http://ec2.amazonaws.com/ 
Ref:   https://whois.arin.net/rest/org/AMAZO-4 


OrgAbuseHandle: AEA8-ARIN 
OrgAbuseName: Amazon EC2 Abuse 
OrgAbusePhone: +1-206-266-4064 
OrgAbuseEmail: [email protected] 
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN 

OrgTechHandle: ANO24-ARIN 
OrgTechName: Amazon EC2 Network Operations 
OrgTechPhone: +1-206-266-4064 
OrgTechEmail: [email protected] 
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN 

OrgNOCHandle: AANO1-ARIN 
OrgNOCName: Amazon AWS Network Operations 
OrgNOCPhone: +1-206-266-4064 
OrgNOCEmail: [email protected] 
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN 


# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 

異なるオプション

はここで別のオプションがあります。

このコードは、別のサービスからのwhoisリクエストのHTMLファイルをスクリプトのフォルダに作成します。あなたのニーズに合わせて変更することができます、私は基本を書いたばかりです。

import urllib.request 
import tempfile 
import io 
from bs4 import BeautifulSoup 
import sys 

def writeFile(text): 
    with io.open('whoisData.txt', "w", encoding="utf-8") as f: 
     f.write(text) 
    f.close() 

def readHTML(domain): 
    url = 'https://www.whois.com/whois/' + domain 
    html = urllib.request.urlopen(url).read() 
    soup = BeautifulSoup(html) 

    # kill all script and style elements 
    for script in soup(["script", "style"]): 
     script.extract() # rip it out 

    # get text 
    text = soup.get_text() 

    # break into lines and remove leading and trailing space on each 
    lines = (line.strip() for line in text.splitlines()) 
    # break multi-headlines into a line each 
    chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 
    # drop blank lines 
    text = '\n'.join(chunk for chunk in chunks if chunk) 
    writeFile(text) 

def main(): 
    domain = sys.argv[1] 
    readHTML(domain) 

main() 

は(HTMLなどを解析する上で)hereからいくつかの基準を取りました。

+0

ありがとう@ oBit91、私はそれを使用します。非常に効率的です。もう一つは、私がPythonのWhoisライブラリを使用していたときに、Whoisレスポンスのレジストラレコードを抽出できました。 response.decode()。registrarによってレジストラ名を抽出できますか?大変申し訳ありませんが、今はラップトップにアクセスできません。 –

+0

しかし、それはwhoisレコードを返さないようです。私はpython test.py bbc.comでテストし、印刷された値はもっと有効な私のwhois.whois( "bbc.com")出力と同じではありません。 –

+0

@ShahroozPooryousefオンラインWebを使用してHTMLファイルを解析するという2番目のオプションを追加しました。いずれにしても、あなたは何に最適かを見ることができます。 – oBit91

関連する問題