：期待される文字列またはその他の文字バッファオブジェクト

私のファイルは、貴様のURLを持っています：期待される文字列またはその他の文字バッファオブジェクト

www.example.com 
www.example.com/validagain 
www.example.com/search?q=jsdajasj;kdas  --> trying to get rid off 
www.example.com/anothervalid

私が使用して/searchを分離することができました正規表現：

import re 

generate_links = re.compile('http://(.*)') #compile all http links 
generate_links2 = re.compile('(.*)/eng/(.*)') #compile all english url 
with open ("VAC\queue.txt", "r") as queued_list, open('newqueue.txt','w') as queued_list_updated: 
    for links in queued_list: 
     url = "" 
     services_url = "" 
     valid_url = "" 
     match = generate_links2.search(links) 
     if match is not None: 
      url = match.group() 
      generate_links3 = re.compile('(.*)/services/(.*)') #compile all services links 
      match2 = generate_links3.search(links) 
      if match2 is not None: 
       services_url = match2.group() 
       print services_url 
       generate_links4 = re.compile('(.*)/search?(.*)') #compiled error links 
       match3 = generate_links4.search(links) #matched all error links

しかし、どのように私は自分自身を削除するために戻ってservices_urlからmatch3変数を使用するか、または交換すること？

ので期待される結果は次のようになります。

www.example.com 
www.example.com/validagain 
www.example.com/anothervalid

出典

2016-06-18 peaceandiago

あなたが含むURLを取り除きたい場合は「検索します？」試してください：

from __future__ import print_function 

with open() as in, open() as out: 
    cured_url = [l for l in in.readlines() if 'search?' not in l] 

    for url in cured_url: 
     print(url, file=out)

出典

2016-06-18 14:35:35

：期待される文字列またはその他の文字バッファオブジェクト

答えて

関連する問題