मैं अपने वर्ड प्रोसेसर में टेक्स्ट को पहले वाक्यों में और फिर शब्दों में विभाजित करने की कोशिश कर रहा हूं।

एक उदाहरण पैराग्राफ:

When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.

1) This a numbered sentence
2) This is the second numbered sentence

At the same time with his ears and his eyes he offered a small prayer to the child.

Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence

मैंने निम्नलिखित कोड की कोशिश की

from nltk.tokenize import TweetTokenizer, sent_tokenize

tokenizer_words = TweetTokenizer()
tokens_sentences = [tokenizer_words.tokenize(t) for t in 
nltk.sent_tokenize(input_text)]
print(tokens_sentences)
import nltk

sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences
# now loop over each sentence and tokenize it separately
for sentence in sent_text:
    tokenized_text = nltk.word_tokenize(sentence)
    print(tokenized_text)

मुझे जो आउटपुट मिला है

[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'], 
['1', ')', 'This', 'a', 'numbered', 'sentence', '2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence','At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples', '-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence',
'-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]

आवश्यक आउटपुट

[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'], 
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]

बुलेट और नंबरिंग पर वाक्य को कैसे विभाजित करें?

स्पासी पर समाधान भी बहुत मददगार होंगे

1
user13491763 13 मई 2020, 12:23

2 जवाब

मैं स्पासी के बारे में निश्चित नहीं हूं। रूबी में आप PragmaticSegmenter और PragmaticTokenizer

text = "When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner.\n\n1) This a numbered sentence\n2) This is the second numbered sentence\n\nAt the same time with his ears and his eyes he offered a small prayer to the child.\n\nBelow are the examples\n- This an example of bullet point sentence\n- This is also an example of bullet point sentence"

final_array = []
segments = PragmaticSegmenter::Segmenter.new(text: text).segment
segments.each do |segment|
  final_array << PragmaticTokenizer::Tokenizer.new(downcase: false).tokenize(segment)
end

final_array
=> 
[
  ["When", "the", "blow", "was", "repeated", ",", "together", "with", "an", "admonition", "in", "childish", "sentences", ",", "he", "turned", "over", "upon", "his", "back", ",", "and", "held", "his", "paws", "in", "a", "peculiar", "manner", "."], 
  ["1", ")", "This", "a", "numbered", "sentence"], 
  ["2", ")", "This", "is", "the", "second", "numbered", "sentence"], 
  ["At", "the", "same", "time", "with", "his", "ears", "and", "his", "eyes", "he", "offered", "a", "small", "prayer", "to", "the", "child", "."], 
  ["Below", "are", "the", "examples"], 
  ["-", "This", "an", "example", "of", "bullet", "point", "sentence"], 
  ["-", "This", "is", "also", "an", "example", "of", "bullet", "point", "sentence"]
]
0
diasks2 14 मई 2020, 12:24
यहाँ एक अजगर पोर्ट है: spacy.io/universe/project/…
 – 
diasks2
14 मई 2020, 12:25
उपरोक्त अजगर बंदरगाह की कोशिश की, लेकिन इसने मुझे अजीब परिणाम दिए। क्या इसे किसी अन्य पायथन तरीके से हल किया जा सकता है
 – 
User 789
14 मई 2020, 12:54

यह एक समाधान हो सकता है। आप इसे अपने डेटा के अनुसार अनुकूलित कर सकते हैं

text = """When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.

1) This a numbered sentence
2) This is the second numbered sentence

At the same time with his ears and his eyes he offered a small prayer to the child.

Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence"""



import re
import nltk

sentences = nltk.sent_tokenize(text)
results = []

for sent in sentences:
    sent = re.sub(r'(\n)(-|[0-9])', r"\1\n\2", sent)
    sent = sent.split('\n\n')
    for s in sent:
        results.append(nltk.word_tokenize(s))

results

[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'], 
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
0
Marco Cerliani 15 मई 2020, 16:52