2017-02-02 3 views
1

私はNLTKする新しいし、次のコードから個人、組織、GPEを抽出しようとしている:私は多くのリンクが、のdidnを経てbinary = Falseのとき、ツリー構造からPER、ORG、GPEなどの名前付きエンティティを抽出するにはどうすればよいですか?

(S 
    Our/PRP$ 
    direct/JJ 
    competitors/NNS 
    include/VBP 
    ,/, 
    among/IN 
    others/NNS 
    ,/, 
    (PERSON Accenture/NNP) 
    ,/, 
    (GPE Capgemini/NNP) 
    ,/, 
    (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP) 
    ,/, 
    (GPE Genpact/NNP) 
    ,/, 
    (ORGANIZATION HCL/NNP Technologies/NNPS) 
    ,/, 
    (ORGANIZATION HP/NNP Enterprise/NNP) 
    ,/, 
    (ORGANIZATION IBM/NNP Global/NNP Services/NNPS) 
    ,/, 
    (ORGANIZATION Infosys/NNP Technologies/NNPS) 
    ,/, 
    (PERSON Tata/NNP Consultancy/NNP Services/NNPS) 
    and/CC 
    (PERSON Wipro/NNP) 
    ./.) 
(S 
    These/DT 
    markets/NNS 
    also/RB 
    include/VBP 
    numerous/JJ 
    smaller/JJR 
    local/JJ 
    competitors/NNS 
    in/IN 
    the/DT 
    various/JJ 
    geographic/JJ 
    markets/NNS 
    in/IN 
    which/WDT 
    we/PRP 
    operate/VBP 
    which/WDT 
    may/MD 
    be/VB 
    able/JJ 
    to/TO 
    provide/VB 
    services/NNS 
    and/CC 
    solutions/NNS 
    at/IN 
    lower/JJR 
    costs/NNS 
    or/CC 
    on/IN 
    terms/NNS 
    more/RBR 
    attractive/JJ 
    to/TO 
    clients/NNS 
    than/IN 
    we/PRP 
    can/MD 
    ./.) 
(S 
    Our/PRP$ 
    direct/JJ 
    competitors/NNS 
    include/VBP 
    ,/, 
    among/IN 
    others/NNS 
    ,/, 
    (PERSON Accenture/NNP) 
    ,/, 
    (GPE Capgemini/NNP) 
    ,/, 
    (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP) 
    ,/, 
    (GPE Genpact/NNP) 
    ,/, 
    (ORGANIZATION HCL/NNP Technologies/NNPS) 
    ,/, 
    (ORGANIZATION HP/NNP Enterprise/NNP) 
    ,/, 
    (ORGANIZATION IBM/NNP Global/NNP Services/NNPS) 
    ,/, 
    (ORGANIZATION Infosys/NNP Technologies/NNPS) 
    ,/, 
    (PERSON Tata/NNP Consultancy/NNP Services/NNPS) 
    and/CC 
    (PERSON Wipro/NNP) 
    ./.) 
(S 
    The/DT 
    rates/NNS 
    we/PRP 
    are/VBP 
    able/JJ 
    to/TO 
    recover/VB 
    for/IN 
    our/PRP$ 
    services/NNS 
    are/VBP 
    affected/VBN 
    by/IN 
    a/DT 
    number/NN 
    of/IN 
    factors/NNS 
    ,/, 
    including/VBG 
    :/: 
    •/VB 
    our/PRP$ 
    clients’/JJ 
    perceptions/NNS 
    of/IN 
    our/PRP$ 
    ability/NN 
    to/TO 
    add/VB 
    value/NN 
    through/IN 
    our/PRP$ 
    services/NNS 
    ;/: 
    •/NNP 
    introduction/NN 
    of/IN 
    new/JJ 
    services/NNS 
    or/CC 
    products/NNS 
    by/IN 
    us/PRP 
    or/CC 
    our/PRP$ 
    competitors/NNS 
    ;/: 
    •/VB 
    our/PRP$ 
    competitors’/NN 
    pricing/NN 
    policies/NNS 
    ;/: 
    •/VB 
    our/PRP$ 
    ability/NN 
    to/TO 
    accurately/RB 
    estimate/VB 
    ,/, 
    attain/NN 
    and/CC 
    sustain/NN 
    contract/NN 
    revenues/NNS 
    ,/, 
    margins/NNS 
    and/CC 
    cash/NN 
    flows/NNS 
    over/IN 
    increasingly/RB 
    longer/JJR 
    contract/NN 
    periods/NNS 
    ;/: 
    •/NNP 
    bid/NN 
    practices/NNS 
    of/IN 
    clients/NNS 
    and/CC 
    their/PRP$ 
    use/NN 
    of/IN 
    third-party/JJ 
    advisors/NNS 
    ;/: 
    •/VB 
    the/DT 
    use/NN 
    by/IN 
    our/PRP$ 
    competitors/NNS 
    and/CC 
    our/PRP$ 
    clients/NNS 
    of/IN 
    offshore/JJ 
    resources/NNS 
    to/TO 
    provide/VB 
    lower-cost/JJ 
    service/NN 
    delivery/NN 
    capabilities/NNS 
    ;/: 
    •/VB 
    our/PRP$ 
    ability/NN 
    to/TO 
    charge/VB 
    premium/NN 
    prices/NNS 
    when/WRB 
    justified/VBN 
    by/IN 
    market/NN 
    demand/NN 
    or/CC 
    the/DT 
    type/NN 
    of/IN 
    service/NN 
    ;/: 
    and/CC 
    •/VB 
    general/JJ 
    economic/JJ 
    and/CC 
    political/JJ 
    conditions/NNS 
    ./.) 
(S 
    For/IN 
    our/PRP$ 
    internal/JJ 
    management/NN 
    reporting/NN 
    and/CC 
    budgeting/NN 
    purposes/NNS 
    ,/, 
    we/PRP 
    use/VBP 
    non-GAAP/JJ 
    financial/JJ 
    information/NN 
    that/WDT 
    does/VBZ 
    not/RB 
    include/VB 
    stock-based/JJ 
    compensation/NN 
    expense/NN 
    ,/, 
    acquisition-related/JJ 
    charges/NNS 
    and/CC 
    net/JJ 
    non-operating/JJ 
    foreign/JJ 
    currency/NN 
    exchange/NN 
    gains/NNS 
    or/CC 
    losses/NNS 
    for/IN 
    financial/JJ 
    and/CC 
    operational/JJ 
    decision/NN 
    making/NN 
    ,/, 
    to/TO 
    evaluate/VB 
    period-to-period/JJ 
    comparisons/NNS 
    and/CC 
    for/IN 
    making/VBG 
    comparisons/NNS 
    of/IN 
    our/PRP$ 
    operating/NN 
    results/NNS 
    to/TO 
    those/DT 
    of/IN 
    our/PRP$ 
    competitors/NNS 
    ./.) 

:私が得た出力がある

for i in tokcomp: 
words = nltk.word_tokenize(i) 
tagged = nltk.pos_tag(words) 
namedEnt = nltk.ne_chunk(tagged, binary=False) 
print(namedEnt) 

私の目的に合った方法を見つけて、人物、組織、およびGPEとしてタグ付けされた会社を抽出します。

nltkウェブサイト以外の名前付きエンティティを抽出する方法についてのリンクがあれば、非常に感謝します。

+0

可能な複製http://stackoverflow.com/q/31836058/610569 – alvas

答えて

0

このlinkのコードを適用し、上記の結果から名前付きエンティティを取得することができます。 nltk.ne_chunkの代わりにnltk.ne_chunk_sents()関数を使用しました。

関連する問題