मैं अपने वर्ड प्रोसेसर में टेक्स्ट को पहले वाक्यों में और फिर शब्दों में विभाजित करने की कोशिश कर रहा हूं।
एक उदाहरण पैराग्राफ:
When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.
1) This a numbered sentence
2) This is the second numbered sentence
At the same time with his ears and his eyes he offered a small prayer to the child.
Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence
मैंने निम्नलिखित कोड की कोशिश की
from nltk.tokenize import TweetTokenizer, sent_tokenize
tokenizer_words = TweetTokenizer()
tokens_sentences = [tokenizer_words.tokenize(t) for t in
nltk.sent_tokenize(input_text)]
print(tokens_sentences)
import nltk
sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences
# now loop over each sentence and tokenize it separately
for sentence in sent_text:
tokenized_text = nltk.word_tokenize(sentence)
print(tokenized_text)
मुझे जो आउटपुट मिला है
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence', '2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence','At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples', '-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence',
'-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
आवश्यक आउटपुट
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
बुलेट और नंबरिंग पर वाक्य को कैसे विभाजित करें?
स्पासी पर समाधान भी बहुत मददगार होंगे
1
user13491763
13 मई 2020, 12:23
2 जवाब
मैं स्पासी के बारे में निश्चित नहीं हूं। रूबी में आप PragmaticSegmenter और PragmaticTokenizer।
text = "When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner.\n\n1) This a numbered sentence\n2) This is the second numbered sentence\n\nAt the same time with his ears and his eyes he offered a small prayer to the child.\n\nBelow are the examples\n- This an example of bullet point sentence\n- This is also an example of bullet point sentence"
final_array = []
segments = PragmaticSegmenter::Segmenter.new(text: text).segment
segments.each do |segment|
final_array << PragmaticTokenizer::Tokenizer.new(downcase: false).tokenize(segment)
end
final_array
=>
[
["When", "the", "blow", "was", "repeated", ",", "together", "with", "an", "admonition", "in", "childish", "sentences", ",", "he", "turned", "over", "upon", "his", "back", ",", "and", "held", "his", "paws", "in", "a", "peculiar", "manner", "."],
["1", ")", "This", "a", "numbered", "sentence"],
["2", ")", "This", "is", "the", "second", "numbered", "sentence"],
["At", "the", "same", "time", "with", "his", "ears", "and", "his", "eyes", "he", "offered", "a", "small", "prayer", "to", "the", "child", "."],
["Below", "are", "the", "examples"],
["-", "This", "an", "example", "of", "bullet", "point", "sentence"],
["-", "This", "is", "also", "an", "example", "of", "bullet", "point", "sentence"]
]
0
diasks2
14 मई 2020, 12:24
यहाँ एक अजगर पोर्ट है: spacy.io/universe/project/…
– diasks2
14 मई 2020, 12:25
उपरोक्त अजगर बंदरगाह की कोशिश की, लेकिन इसने मुझे अजीब परिणाम दिए। क्या इसे किसी अन्य पायथन तरीके से हल किया जा सकता है
– User 789
14 मई 2020, 12:54
यह एक समाधान हो सकता है। आप इसे अपने डेटा के अनुसार अनुकूलित कर सकते हैं
text = """When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.
1) This a numbered sentence
2) This is the second numbered sentence
At the same time with his ears and his eyes he offered a small prayer to the child.
Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence"""
import re
import nltk
sentences = nltk.sent_tokenize(text)
results = []
for sent in sentences:
sent = re.sub(r'(\n)(-|[0-9])', r"\1\n\2", sent)
sent = sent.split('\n\n')
for s in sent:
results.append(nltk.word_tokenize(s))
results
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
0
Marco Cerliani
15 मई 2020, 16:52
संबंधित सवाल
नए सवाल
nlp
प्राकृतिक भाषा प्रसंस्करण (एनएलपी) कृत्रिम बुद्धि का एक उपक्षेत्र है जिसमें प्राकृतिक भाषा के डेटा से उपयोगी जानकारी को बदलना या निकालना शामिल है। विधियों में मशीन-लर्निंग और नियम-आधारित दृष्टिकोण शामिल हैं।