On this channel, I provide tutorials for working with Python in a digital humanities project. I design my videos and tutorials for humanists who have no coding experience. I am a medieval historian by trade, but I create my videos with all humanists in mind. If you want to interact with the videos in more dynamic ways, check out my website, www.PythonHumanities.com. On that site, I host live coding exercises and quizzes. It is still a work in progress and will be complete during the Summer of 2020. I post 1-10 videos per week, so check back frequently.
My output shows one huge cluster and many tiny clusters. What does that mean? Note: First cluster clearly represents the topic. All other clusters have the other irrelevant words with frequency generally 1. Shall I remove all those tokens from the corpus?
train_spacy() update for spacy 3 as follows: def train_spacy(data, iterations): TRAIN_DATA = data nlp = spacy.blank("en") #blank fresh english model if "ner" not in nlp.pipe_names: #if the model does not have an ner pipeline ner = nlp.create_pipe("ner") nlp.add_pipe("ner", last=True) else: ner = nlp.get_pipe("ner") for _, annotations in TRAIN_DATA: #Each element in TRAIN_DATA is a tuple of text, {"entities":[]} for ent in annotations.get("entities"): #Each element in entities value is (start, end, label) e.g. (0, 20, "PERSON") ner.add_label(ent[2]) #ent[2] could be PERSON, ORG etc. other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"] #with nlp.disable(*other_pipes): #deprecated line with nlp.select_pipes(disable=other_pipes): optimizer = nlp.begin_training() for itn in range(iterations): print("Starting iteration", itn) #randomly shuffling training data in every iteration #this is a common practice in machine learning #this is done to ensure that the model doesn't just memorize the order random.shuffle(TRAIN_DATA) losses = {} for text, annotations in TRAIN_DATA: #Commenting Spacy2 nlp.update() #nlp.update( # [text], # [annotations], # drop=0.2, #this is going to prevent overfitting # sgd=optimizer, # losses=losses # ) #Spacy3 nlp.update() doc = nlp.make_doc(text) example = Example.from_dict(doc, annotations) nlp.update([example], sgd=optimizer, losses=losses, drop=0.2) print(losses) return nlp #returning trained model
If you're using above render method in a jupyter notebook, it will return an empty html string. The concrete problem above is that displacy.render auto-detects that you're in a jupyter notebook and displays the output directly instead of returning HTML. Important note: To explicitly enable or disable “Jupyter mode”, you can use the jupyter keyword argument - e.g. to return raw HTML in a notebook, or to force Jupyter rendering if auto-detection fails. html = displacy.render(sentence, style="dep", jupyter=False) :)
Now what if you want a default entry so if the user hit enter nothing changes; for entry in temp if i == int(entry_option): field1 = entry["field1] field2 = entry["field2] temp_data = {} temp_data[field1] = str(input(f"current {field1}") or f"{field1}") temp_data[field2] = str(input(f"current {field1}") or f"{field2}") new_data.append(temp_data) i=i+1
Thanks for the video. I was having issues with the "getSkewAngle" function. I found an easy workaround though. I changed the last line "return -1.0 * angle" to just "return angle" Hope this helps anyone else with this problem.
Interesting! I have been working with whisperX and whisper-timestamped on my MacBook so far and wasn't aware of MLX. Thanks for sharing! But since you emphasize the word level timestamps: with standard whisper those are known to be very inaccurate (i.e. pretty much unusable - whisper is simply not trained to predict timestamps). So, are you suggesting that timestamps in whisper-mlx are better?
Regarding your example with Auschwitz: how exactly did it learn that Auschwitz belongs to concentration_camp type? Is it because your example sentence happened to say exactly that or is that just a coincidence?
Simply put. LLMs can't access your company's information directly. RAG lets you share relevant documents with the LLM, so it can answer your questions using your own data.
@@python-programming okay so it means downloading and installing the llm on my device and then using it?And can all the open source llms be downloaded?
@@RamandeepSingh_04yes, but Small Language Models will typically run smoothly on newer phones or PCs with lower-end GPUs, Large LMs for more VRAM capable systems, or likely the soon-to-come ai chipsets. Personally my M1 Mac really struggles to run a smaller Dolphin-Llama model, and I’d need to upgrade that one to M3 silicon or newer I think. I have 16GB VRAM in my PC GPU, so it might run better on there.
It will, but the latency may be an issue. I'm not sure of anything that can do real-time NER the way in which you can get transcriptions in near real-time.
Hi, Congratulations Your videos are very successful. However, I have a slightly more specific problem. I'm trying to read the writings on the tire treads, but OCR doesn't work. I relatively increase the performance of OCR by performing various pre-processes, but these processes are not adaptive and do not produce the effect I expect. What is your advice for a problem that has a contrast problem, such as the writings on the tire surface, and cannot be solved by pre-processing?
Thanks a lot, but when I print network the size for most frequent nodes are big but other nodes disappears.. the code is the same.. I don't understand my mistake