シングルスレッドをマルチスレッドのPythonスクリプトに変換する方法は？

並列スレッドのパフォーマンスを向上させるために、シングルスレッドスクリプトをマルチスレッドスクリプトに入れたいと考えています。ボトルネックは、レジストラを要求するためのレイテンシです。パフォーマンスを向上させるために1つ以上の要求を出したいと思っています。シングルスレッドをマルチスレッドのPythonスクリプトに変換する方法は？

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0}) 

for d in find_document: 
    try: 
     domaine = d['domain'] 
     print(domaine) 
     w = whois.whois(domaine) 
     date = w.expiration_date 
     print date 
     collection.update({"domain": domaine}, {"$set": {"expire": date}}) 
    except whois.parser.PywhoisError, err: 
     print "AVAILABLE" 
     collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})

どのような方法が最適ですか？マップにプールを使用しますか？別の方法？

ご回答いただきありがとうございます。

出典

2016-12-23 LionelF

まず、マルチスレッド/マルチ処理で何をしたいかを判断する必要があります。ファイル書き込みを待機している間にダウンタイムを利用したいですか？これがバックグラウンドで計算されている間、ユーザーのインタラクティビティを維持したいですか？より多くのコアを使用してパフォーマンスの向上を目指していますか？ – Aaron

私はより多くのコアを使用してパフォーマンスを向上させたいと考えています。 – LionelF

コンピューティング速度、ファイル書き込み、またはネットワーク速度が制限されているかどうかは分かりますか？（あなたがインターネットを扱っているのであれば、おそらくソケット待ちです。）（これを反映するように問題をアップデートするか、誰かがあまりにも広すぎるとフラグを立てるかもしれません） – Aaron

インターネットで作業している場合は、複数のリクエストを一度に待つことができるため、マルチプロセッシングの問題に陥ることなく、スレッドから実際のパフォーマンスが向上します。並列実行を行っているときはいつでも、標準出力やファイル書き込みへの印刷で潜在的な問題が発生します。これはスレッドロックで簡単に修正できます。

ターゲット=のfoo＃それが開始されたときに、スレッドが呼び出す機能：私は単にd in find_document

各スレッドごとにシングルスレッドを作成し、あなたのケースで
は、以下を含むいくつかの引数を取ります
引数=（）＃の引数のfooはあなたが絵に

を取得

kwargsから= {}＃で呼び出されます

また、try-exceptを再注文して、tryブロックの行数を制限しました（良い方法）。これを行うには、elseブロックを追加しました。これは可能なことを知るのにとても良いことです（forループとwhileループでも可能です）。これにより、私はあなたの印刷ステートメントをまとめてグループ化することができ、別のスレッドが同時に印刷したり、出力が乱れることを防ぐことができます。最後に、私はあなたのコレクションオブジェクトが何か、そしてそれが更新メソッドがスレッドセーフであるかどうか分からなかったので、ロックでもそれをラップしました。あなたのコメントを1として

import threading 

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0}) 

def foo(d, printlock, updatelock): 

    domaine = d['domain'] 
    try: 
     w = whois.whois(domaine) #try to keep only what's necessary in try/except block 
    except whois.parser.PywhoisError, err: 
     with printlock: 
      print(domaine) 
      print("AVAILABLE") 
     with updatelock 
      collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}}) 
    else: 
     date = w.expiration_date 
     with printlock: 
      print(domaine) #move print statements together so lock doesn't block for long 
      print(date) 
     with updatelock 
      collection.update({"domain": domaine}, {"$set": {"expire": date}}) 

updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off 
printlock = threading.Lock() #make sure only one thread prints at a time 

threads = [] 
for d in find_document: #Create a list of threads and start them all 
    t = threading.Thread(target=foo, args=(d,printlock,updatelock,)) 
    threads.append(t) 
    t.start() #start each thread as we create it 

for t in threads: #wait for all threads to complete 
    t.join()

、あなたは、同時にそれらすべてを実行しようとするために、あまりにも多くの仕事を持っているので、私たちは私の以前の例よりもマルチプロセッシング・プールのようなものが必要になります。これを行う方法は、消費する引数がなくなるまで、指定された関数をループして新しい引数を消費するスレッドを設定することです。私がすでに書いたコードを残すために、私はちょうどfooを呼び出す新しい関数としてこれを追加しますが、あなたはそれをすべてひとつの関数に書くことができます。

import threading 

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0}) 

def foo(d, printlock, updatelock): 

    domaine = d['domain'] 
    try: 
     w = whois.whois(domaine) #try to keep only what's necessary in try/except block 
    except whois.parser.PywhoisError, err: 
     with printlock: 
      print(domaine) 
      print("AVAILABLE") 
     with updatelock: 
      collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}}) 
    else: 
     date = w.expiration_date 
     with printlock: 
      print(domaine) #move print statements together so lock doesn't block for long 
      print(date) 
     with updatelock: 
      collection.update({"domain": domaine}, {"$set": {"expire": date}}) 

def consumer(producer): 
    while True: 
     try: 
      with iterlock: #no idea if find_document.iter is thread safe... assume not 
       d = producer.next() #unrolling a for loop into a while loop 
     except StopIteration: 
      return #we're done 
     else: 
      foo(d, printlock, updatelock) #call our function from before 

iterlock = threading.Lock() #lock to get next element from iterator 
updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off 
printlock = threading.Lock() #make sure only one thread prints at a time 

producer = iter(find_document) #create an iterator from find_document (expanded syntax of for _ in _ with function calls) 

threads = [] 
for _ in range(16): #Create a list of 16 threads and start them all 
    t = threading.Thread(target=consumer, args=(producer,)) 
    threads.append(t) 
    t.start() #start each thread as we create it 

for t in threads: #wait for all threads to complete 
    t.join()

出典

2016-12-23 15:30:07 Aaron

ありがとうございました！ – LionelF

それはうまくいくようですが、私のリストには何百万ものドメインがあり、すべてのスレッドの起動によってmongodbが同時に爆発しました。私はスレッドの数を制限することができなければなりません。どのようにできるのか？ – LionelF

このプロジェクトがどのような規模のものなのかわかりませんでした。プロデューサー - コンシューマースキーマのようなものを作りたいと思っています。私はそれについての短い例を書きます。 – Aaron

シングルスレッドをマルチスレッドのPythonスクリプトに変換する方法は？

答えて

関連する問題