私は6つのリストまたは配列を持っているとしましょう。各リストには、任意の量の単語があります。リスト内の項目の交差を確認する
0 | 1 | 2 | 3 | 4 | 5 | ... N
-----------------------------------------------------------
cat dog pine tree light fan
cat dog pine tree light fan
cat dog pine tree light fan
cat dog pine tree light fan
cat dog pine tree light fan
私はそれらのすべての単語を入力する気がしませんでしたが、私は交差点を取得したいとしましょう。すべての交差点を見つけることは非常に単純であり、それはこのような機能をpythonで行うことができます。
all = set(zer0).intersection(one).intersection(two).intersection(...N)
私は簡単な解決策が不足していないと比べてこのことを考えておりませんことを確認したいと思います。
上記の例では、私が行う必要がある任意の2つのリストの一致を得るためです。
0&1, 0&2, 0&3, 0&4, 0&5, 0&..N
0&1&2, 0&1&3, 0&1&4, 0&1&5, 0&1&..N
配列ゼロと配列ならば1がゼロと3が行う類似した単語が含まれているが、配列いないものだけを二つのリスト、との例を見ているので、私は尋ねる理由で?
これを一般化する方法はありますか、私はそれが解決されたという強い感情を持っています。
私は、catという単語が ,0,1,2、.. Nリストに表示されていることを知りたいと考えています。
[編集] ここでは私が協力しているsample dataです。
data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8")
data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8")
data2 = unicode("Tropical forests cover a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8")
テキストをクリーンアップし、3つのリストdata0_list、... data2_listにトークン化します。
この後、このような関数呼び出しによってデータが出力されます。
master_list.append(data_0)
master_list.append(data_1)
master_list.append(data_2)
for item in master_list:
for index, item in enumerate(item):
print(index, item)
出力は次のように見えること:この例では
=========== start data_0 ==============
(0, ((u'the',), 13))
(1, ((u'of',), 10))
(2, ((u'rainforests',), 7))
(3, ((u'and',), 7))
(4, ((u'tropical',), 5))
(5, ((u'to',), 4))
(6, ((u'rainforest',), 4))
(7, ((u'in',), 4))
(8, ((u'are',), 4))
(9, ((u'a',), 4))
(10, ((u'it',), 3))
(11, ((u'by',), 3))
(12, ((u'been',), 3))
(13, ((u's',), 3))
(14, ((u'is',), 3))
(15, ((u'there',), 3))
(16, ((u'have',), 2))
(17, ((u'earth',), 2))
(18, ((u'sometimes',), 2))
(19, ((u'also',), 2))
(20, ((u'oxygen',), 2))
(21, ((u'jungle',), 2))
(22, ((u'rainfall',), 2))
(23, ((u'for',), 2))
(24, ((u'through',), 2))
(25, ((u'called',), 2))
(26, ((u'be',), 2))
(27, ((u'world',), 2))
(28, ((u'species',), 2))
(29, ((u'ground',), 2))
(30, ((u'shrubs',), 1))
(31, ((u'may',), 1))
(32, ((u'biotic',), 1))
(33, ((u'from',), 1))
(34, ((u'respiration',), 1))
(35, ((u'known',), 1))
(36, ((u'largest',), 1))
(37, ((u'discovered',), 1))
(38, ((u'two',), 1))
(39, ((u'plants',), 1))
(40, ((u'conditions',), 1))
(41, ((u'insects',), 1))
(42, ((u'necessary',), 1))
(43, ((u'1',), 1))
(44, ((u'convergence',), 1))
(45, ((u'jewels',), 1))
(46, ((u'poor',), 1))
(47, ((u'estimated',), 1))
(48, ((u'if',), 1))
(49, ((u'creating',), 1))
(50, ((u'that',), 1))
(51, ((u'75',), 1))
(52, ((u'growth',), 1))
(53, ((u'penetration',), 1))
(54, ((u'thinned',), 1))
(55, ((u'has',), 1))
(56, ((u'characterized',), 1))
(57, ((u'plays',), 1))
(58, ((u'temperate',), 1))
(59, ((u'production',), 1))
(60, ((u'because',), 1))
(61, ((u'high',), 1))
(62, ((u'98',), 1))
(63, ((u'trough',), 1))
(64, ((u'centimetres',), 1))
(65, ((u'over',), 1))
(66, ((u'some',), 1))
(67, ((u'undiscovered',), 1))
(68, ((u'natural',), 1))
(69, ((u'still',), 1))
(70, ((u'misnamed',), 1))
(71, ((u'all',), 1))
(72, ((u'many',), 1))
(73, ((u'sunlight',), 1))
(74, ((u'millions',), 1))
(75, ((u'dioxide',), 1))
(76, ((u'around',), 1))
(77, ((u'28',), 1))
(78, ((u'monsoon',), 1))
(79, ((u'canopy',), 1))
(80, ((u'photosynthesis',), 1))
(81, ((u'level',), 1))
(82, ((u'177',), 1))
(83, ((u'trees',), 1))
(84, ((u'carbon',), 1))
(85, ((u'one',), 1))
(86, ((u'4',), 1))
(87, ((u'between',), 1))
(88, ((u'areas',), 1))
(89, ((u'responsible',), 1))
(90, ((u'as',), 1))
(91, ((u'vines',), 1))
(92, ((u'450',), 1))
(93, ((u'turnover',), 1))
(94, ((u'leaf',), 1))
(95, ((u'role',), 1))
(96, ((u'indigenous',), 1))
(97, ((u'can',), 1))
(98, ((u'with',), 1))
(99, ((u'types',), 1))
(100, ((u'alternatively',), 1))
(101, ((u'annual',), 1))
(102, ((u'generally',), 1))
(103, ((u'zone',), 1))
(104, ((u'beneath',), 1))
(105, ((u'significant',), 1))
(106, ((u'consuming',), 1))
(107, ((u'microorganisms',), 1))
(108, ((u'applied',), 1))
(109, ((u'soon',), 1))
(110, ((u'2',), 1))
(111, ((u'tangled',), 1))
(112, ((u'250',), 1))
(113, ((u'restricted',), 1))
(114, ((u'undergrowth',), 1))
(115, ((u'medicines',), 1))
(116, ((u'climatic',), 1))
(117, ((u'colonized',), 1))
(118, ((u'forests',), 1))
(119, ((u'dense',), 1))
(120, ((u'pharmacy',), 1))
(121, ((u'quarter',), 1))
(122, ((u'intertropical',), 1))
(123, ((u'term',), 1))
(124, ((u'or',), 1))
(125, ((u'destroyed',), 1))
(126, ((u'processing',), 1))
(127, ((u'3',), 1))
(128, ((u'small',), 1))
(129, ((u'40',), 1))
=========== start data_1 ==============
(0, ((u'the',), 15))
(1, ((u'of',), 8))
(2, ((u'in',), 6))
(3, ((u'and',), 6))
(4, ((u'tropical',), 5))
(5, ((u'cm',), 4))
(6, ((u'to',), 3))
(7, ((u'are',), 3))
(8, ((u'rainforests',), 3))
(9, ((u'forests',), 3))
(10, ((u'south',), 2))
(11, ((u'from',), 2))
(12, ((u'it',), 2))
(13, ((u'g',), 2))
(14, ((u'no',), 2))
(15, ((u'known',), 2))
(16, ((u'rainforest',), 2))
(17, ((u'exceed',), 2))
(18, ((u'although',), 2))
(19, ((u'typically',), 2))
(20, ((u'america',), 2))
(21, ((u'e',), 2))
(22, ((u'many',), 2))
(23, ((u's',), 2))
(24, ((u'between',), 2))
(25, ((u'as',), 2))
(26, ((u'is',), 2))
(27, ((u'with',), 2))
(28, ((u'zone',), 2))
(29, ((u'congo',), 2))
(30, ((u'tropic',), 2))
(31, ((u'equatorial',), 1))
(32, ((u'within',), 1))
(33, ((u'located',), 1))
(34, ((u'convergence',), 1))
(35, ((u'now',), 1))
(36, ((u'el',), 1))
(37, ((u'by',), 1))
(38, ((u'saharan',), 1))
(39, ((u'average',), 1))
(40, ((u'lungs',), 1))
(41, ((u'less',), 1))
(42, ((u'64',), 1))
(43, ((u'have',), 1))
(44, ((u'degreef',), 1))
(45, ((u'temperatures',), 1))
(46, ((u'1',), 1))
(47, ((u'africa',), 1))
(48, ((u'earth',), 1))
(49, ((u'200',), 1))
(50, ((u'australia',), 1))
(51, ((u'18',), 1))
(52, ((u'peninsula',), 1))
(53, ((u'indonesia',), 1))
(54, ((u'that',), 1))
(55, ((u'390',), 1))
(56, ((u'been',), 1))
(57, ((u'10',), 1))
(58, ((u'characterized',), 1))
(59, ((u'also',), 1))
(60, ((u'yucatan',), 1))
(61, ((u'6',), 1))
(62, ((u'such',), 1))
(63, ((u'months',), 1))
(64, ((u'000',), 1))
(65, ((u'islands',), 1))
(66, ((u'trough',), 1))
(67, ((u'dry',), 1))
(68, ((u'66',), 1))
(69, ((u'equator',), 1))
(70, ((u'season',), 1))
(71, ((u'mean',), 1))
(72, ((u'sub',), 1))
(73, ((u'oxygen',), 1))
(74, ((u'degrees',), 1))
(75, ((u'7',), 1))
(76, ((u'rainfall',), 1))
(77, ((u'lanka',), 1))
(78, ((u'all',), 1))
(79, ((u'monthly',), 1))
(80, ((u'cancer',), 1))
(81, ((u'monsoon',), 1))
(82, ((u'asia',), 1))
(83, ((u'on',), 1))
(84, ((u'photosynthesis',), 1))
(85, ((u'degreec',), 1))
(86, ((u'southern',), 1))
(87, ((u'location',), 1))
(88, ((u'addition',), 1))
(89, ((u'sri',), 1))
(90, ((u'capricorn',), 1))
(91, ((u'southeast',), 1))
(92, ((u'warm',), 1))
(93, ((u'found',), 1))
(94, ((u'through',), 1))
(95, ((u'cameroon',), 1))
(96, ((u'climate',), 1))
(97, ((u'called',), 1))
(98, ((u'bosawas',), 1))
(99, ((u'pacific',), 1))
(100, ((u'69',), 1))
(101, ((u'5',), 1))
(102, ((u'can',), 1))
(103, ((u'burma',), 1))
(104, ((u'79',), 1))
(105, ((u'papua',), 1))
(106, ((u'annual',), 1))
(107, ((u'lies',), 1))
(108, ((u'atmosphere',), 1))
(109, ((u'substantial',), 1))
(110, ((u'new',), 1))
(111, ((u'168',), 1))
(112, ((u'category',), 1))
(113, ((u'moist',), 1))
(114, ((u'year',), 1))
(115, ((u'little',), 1))
(116, ((u'contribute',), 1))
(117, ((u'during',), 1))
(118, ((u'175',), 1))
(119, ((u'belize',), 1))
(120, ((u'wet',), 1))
(121, ((u'than',), 1))
(122, ((u'guinea',), 1))
(123, ((u'north',), 1))
(124, ((u'philippines',), 1))
(125, ((u'hawai\u02bbi',), 1))
(126, ((u'myanmar',), 1))
(127, ((u'world',), 1))
(128, ((u'peten',), 1))
(129, ((u'exist',), 1))
(130, ((u'net',), 1))
(131, ((u'a',), 1))
(132, ((u'broader',), 1))
(133, ((u'intertropical',), 1))
(134, ((u'calakmul',), 1))
(135, ((u'central',), 1))
(136, ((u'associated',), 1))
(137, ((u'malaysia',), 1))
(138, ((u'amazon',), 1))
=========== start data_2 ==============
(0, ((u'in',), 11))
(1, ((u'the',), 9))
(2, ((u'and',), 9))
(3, ((u'of',), 7))
(4, ((u'temperate',), 3))
(5, ((u'southern',), 3))
(6, ((u'as',), 3))
(7, ((u'coastal',), 3))
(8, ((u'rainforests',), 3))
(9, ((u'east',), 2))
(10, ((u'parts',), 2))
(11, ((u'america',), 2))
(12, ((u'areas',), 2))
(13, ((u'british',), 2))
(14, ((u'coast',), 2))
(15, ((u'occur',), 2))
(16, ((u'regions',), 2))
(17, ((u'are',), 1))
(18, ((u'turkey',), 1))
(19, ((u'they',), 1))
(20, ((u'on',), 1))
(21, ((u'australia',), 1))
(22, ((u'far',), 1))
(23, ((u'oregon',), 1))
(24, ((u'galicia',), 1))
(25, ((u'chile',), 1))
(26, ((u'island',), 1))
(27, ((u'few',), 1))
(28, ((u'zealand',), 1))
(29, ((u'columbia',), 1))
(30, ((u'but',), 1))
(31, ((u'world',), 1))
(32, ((u'sea',), 1))
(33, ((u'taiwan',), 1))
(34, ((u'northwest',), 1))
(35, ((u'europe',), 1))
(36, ((u'10',), 1))
(37, ((u'much',), 1))
(38, ((u'also',), 1))
(39, ((u'north',), 1))
(40, ((u'adriatic',), 1))
(41, ((u'such',), 1))
(42, ((u'cover',), 1))
(43, ((u'forests',), 1))
(44, ((u'part',), 1))
(45, ((u'including',), 1))
(46, ((u'western',), 1))
(47, ((u'a',), 1))
(48, ((u'norway',), 1))
(49, ((u'large',), 1))
(50, ((u'georgia',), 1))
(51, ((u'well',), 1))
(52, ((u'south',), 1))
(53, ((u'globe',), 1))
(54, ((u'tropical',), 1))
(55, ((u'adjacent',), 1))
(56, ((u'washington',), 1))
(57, ((u'only',), 1))
(58, ((u'russian',), 1))
(59, ((u'pacific',), 1))
(60, ((u'japan',), 1))
(61, ((u'black',), 1))
(62, ((u'along',), 1))
(63, ((u'highlands',), 1))
(64, ((u'ireland',), 1))
(65, ((u'sakhalin',), 1))
(66, ((u'balkans',), 1))
(67, ((u'korea',), 1))
(68, ((u'asia',), 1))
(69, ((u'around',), 1))
(70, ((u'scotland',), 1))
(71, ((u'eastern',), 1))
(72, ((u'alaska',), 1))
(73, ((u'china',), 1))
(74, ((u'isles',), 1))
(75, ((u'new',), 1))
(76, ((u'california',), 1))
単語熱帯雨林、世界を、森や他のいくつかのより一般的な言葉は、すべての3つのデータセットです。
私が今しようとしているのは、複数のリストにある単語を見つけることです。
たとえば、rainforestは3/3リストにあると言うことができます。
単語酸素は2/3のリストにあり、data_0 & data_1にあります。
あなたが見ることができるように、あなたはあなたの質問の異なる解釈に基づいて異なった答えの数を受けてきました。あなたは実際に必要なものを明確にすることができますか? –
私はいくつかのテストデータをいくつかの明確化といくつかのサンプルコードを追加しました。 – user1610950
この場合、交差点ではなくさまざまなリストのアイテムの数を探していますが、少なくとも以下のいずれかの答えが問題の解決に役立つはずです。 –