このHTML pageの生のテキストをすべて印刷する必要があります。正しい形式を維持しながら、HTMLの行を印刷する
各行は、この形式になっています。
ENSG00000001461' ';' ';' ';' ';ENST00000432012' ';' ';' ';' ';NIPAL3' ';' ';' ';' ';5' ';' ';' ';' ';1' ';' ';' ';' ';Forward' ';' ';' ';' ';NIPA-like domain containing 3 [Source:HGNC Symbol;Acc:HGNC:25233]<'br/'>
私は次のような出力をしたい:
ENSG00000001461 ENST00000432012 NIPAL3 5 1 Forward NIPA-like domain containing 3 [Source:HGNC Symbol;Acc:HGNC:25233]
しかし、出力は次のとおりです。
ENSG00000001461
これは私のコードです:
import urllib
from bs4 import BeautifulSoup
species = ['HomoSapiens', 'MusMusculus', 'DrosophilaMelanogaster','CaenorhabditisElegans']
rna_target = ['mRNA', 'lincRNA', 'lncRNA']
db = ['MB21E78v2', 'MB19E65v2', 'MB16E62v1']
species_input = input("Selezionare Specie: ")
target_input = input("Selezionare tipo di RNA: ")
db_input = input("Selezionare DataBase: ")
check = 0
for i in range(len(species)):
if species_input == species[i]:
for j in range(len(rna_target)):
if target_input == rna_target[j]:
for k in range(len(db)):
if db_input == db[k]:
check = 1
if check == 1:
print("Dati Inseriti Correttamente!")
else:
print("Error: Dati inseriti in modo errato!")
exit()
url = urllib.request.urlopen("<https://cm.jefferson.edu/rna22/Precomputed/OptionController?>" +"species=" + species_input + "&type=" + target_input + "&version=" +db_input)
print(url.geturl())
identifier = []
seq_input = input("Digitare ID miRNA: ")
seq = ""
seq = seq_input.split()
print(seq)
for i in range(len(seq)):
identifier.append(seq[i] + "%20")
s = ""
string = s.join(identifier)
url_tab = urllib.request.urlopen("<https://cm.jefferson.edu/rna22/Precomputed/InputController?>"+"identifier=" string+"&minBasePairs=12&maxFoldingEnergy=-12&minSumHits=1&maxProb=.1&"+"version=" + db_input + "&species=" + species_input + "&type=" + target_input)
print(url_tab.geturl())
download = urllib.request.urlopen("
<http://cm.jefferson.edu/rna22/Precomputed/InputController?>download=ALL"+"&ident=" + string+"&minBasePairs=12&maxFoldingEnergy=-12&minSumHits=1&maxProb=.1&" +"version=" + db_input + "&species=" + species_input + "&type=" + target_input)
down_string = download.geturl()
print(down_string)
soup = BeautifulSoup(download, "html5lib")
for match in soup.findAll('br'):
match.unwrap()
s2 = soup
s1 = s2.body.extract()
print(s1.prettify(formatter=lambda s: s.strip(u'xa0')))
コードはどこですか? – Selcuk
何から出力? –
質問に* [mcve] *を入れてください。 Pythonはどこですか? – jonrsharpe