
Tndzone
Add a review FollowOverview
-
Founded Date May 12, 1994
-
Posted Jobs 0
-
Viewed 26
Company Description
Open-R1: a Fully Open Reproduction Of DeepSeek-R1
Hey there! This article is an intro to the project, not a claim that we have actually recreated R1 yet. We’re constructing in the open, so as quickly as we have examination numbers, we’ll share them. You can follow our development on Hugging Face and GitHub.
True, but it appears like there’s nothing to be evaluated as of today. I presume the ultimate goal is to train a new thinking model and after that utilize the exact same examination metrics as o1 and the DeepSeek-R1.
Well, there must be at least some sanity check and validation to make sure the model was trained properly.
Oh yes, if you are discussing the examination number of deepseek’s design it’s coming extremely quickly!
As discussed in the blog site post there is no model called Open-R1 to check at all … not yet anyway. This is a blog site detailing that Hugging face will take the R1 Deepseek model, exercise how it was constructed as laid out in the paper and from what they launched, and then duplicate that procedure.
in fact this is quite much how science works … A comes up with a plan, discovery or development and it is evaluated by B, C and D to see if it is reproduceable. Thats been the foundation of research now for a few centuries.
This blog is not saying they have actually already done so … Its a blog site detailing an intent to begin training a model like R1 and calling it Open-R1.
Also DeepSeek-R1 was just launched last week, and even in their paper they laid out the compute hours needed. While those are low calculate hours for a SOTA model this does not indicate you can train stated model in a week. I ‘d personally like to be able to train a transformer design in a week, however we may need to wait a while for that level of compute innovation.
So there are no criteria for a design that has not been developed yet right? As outlined in the blog site, and once again in reply to your question.
However fear not, there is a GitHub Repo already and factors (hell I may join myself), some prelim work done, and a master plan. An excellent starting position.
n
@edbeeching
has actually evaluated the launched models currently
( src: https://x.com/edwardbeeching/status/1884273209136275742)
R1 simply trained on o1 outputs, so jointly …/ s. This is what the new AI czars are saying
Hi! This blog site post is an introduction to the project, not a claim that we have actually reproduced R1 yet. We will totally share the missing piece when we have them, you can expect the designs and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo
That’s nice and essential to understand this significant hype that does not have technical comprehension and explanation. Science has to do with recreation, and if they declare to be open, let them fullfill the open part.
Please do publish the training expense.
We will!
Excalidraw Hi n
@bojan2501
thanks, we will certainly be working hard to make sure this training recipe can work for little language models on customer hardware given that not everyone has a cluster of H100s in the house:-RRB- The tool we utilized for the images was Excalidraw! https://excalidraw.com
eagerly anticipating it! WTF are your discussing?
should be a joke
It’s truly cool to see how the whole open source neighborhood comes together!
Ops …
5.5 M is number reporter in the deepseekv3 tech report (just the training, not the experiment afaik), for R1 hard to approximate tbh however much less than 5.5 M imo
Historically, they have actually never ever released code or datasets of their LLM training, so I wouldn’t expect this time to be different. If they would launch it that would be amazing obviously!
Yes obviously!
So basically you’re asking to replace existing censorship with another flavour of censorship?
The code for the designs are inside the model repositories, e.g. for V3: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py
Hello Team, I’m Ray Bernard, the author and developer of EQUATOR. My research group will be working on a paper focused on replicating certain elements of DeepSeek R1. Our objective is to replicate the cold start and supply your group with a dataset that includes COT and other strategies to support these efforts. We like to contribute our work to assist. Please let me know if you find this beneficial. Best, Ray Bernard https://www.facebook.com/groups/1186310571520299/
Where is the evaluation numbers? without it you can’t call it reproduction.
8 replies
True, but it appears like there’s nothing to be assessed as of today. I assume the supreme goal is to train a new thinking model and after that utilize the same examination metrics as o1 and the DeepSeek-R1.
That’s quite interesting, I was asking myself why the concerns the author exposed here are not being asked by others? I think the work they have done is unforgettable however at the same time I wonder why they wouldn’t put these missing pieces on if they are expected to be totally open.
Why even without reproduction and comprehension of the development they could impact so much the market in this method?
4 replies
Hi! This post is an intro to the job, not a claim that we’ve replicated R1 yet. We will totally share the missing piece when we have them, you can expect the designs and datasets to be upload in this Hugging Face org and the code to be in this GitHub repo
Interesting read, and it is good that we see more effort into this instructions: more optimization and less strength.
Also question what tool did the author use for producing step diagram.
2 replies
Excalidraw I’m so pleased that effort like this already exist, I’m gon na attempt to contribute:-RRB- 1 reply
anticipating it! So racist articel
2 replies
WTF are your discussing?
Awesome to have this open recreation began!
For Step # 1 check out https://github.com/open-thoughts/open-thoughts!
https://x.com/ryanmart3n/status/1884284101265612856
Let’s do this thing!
1 reply
It’s really cool to see how the entire open source neighborhood comes together!
Does anyone understand the real training expense of r1? I can’t discover it in the paper or the . Is the 6M cost reported by media just the number drawn from v3’s training cost?
2 replies
Ops …
Has anybody asked the DeepSeek group to release their training data and code, or at least share them privately with an independent replication job like this? Have they turned down such a request?
A faithful duplication depends on using the very same dataset and hyperparameters. Otherwise, any major inconsistencies with the published criteria would be difficult to pin down-whether due to training data differences or the duplication method itself.
1 reply
Historically, they have never launched code or datasets of their LLM training, so I would not expect this time to be various. If they would launch it that would be fantastic obviously!
In the meantime we need to make finest guess price quotes and see if we can arrive ourselves.
You supply good duplication process of Deepseek reasoning training. I will try something similar to it.
This is truly good details, can we tweak with particular usage case when code is released?
1 reply
Yes of course!
Please consider removing prejudiced, polluted or unaligned training data and make an effort to get rid of copyrighted works from the crawl from consumption. This will make the design more functional. If you reused anthropic curation checks, this may likewise help, remove obviouslybiased data will likely add a lot of worth. We do not desire another tainted, unaligned open source design, right? And no business would ever utilize deepseek or a design that recycles it, right?
We appreciate your work for the advantage of humanity, we hope.
Miike C from NJ
1 reply
So generally you’re asking to change existing censorship with another flavour of censorship?
Can’t wait! Hopefully the model will be uncensored but whatever you can do is alright! Love seeing open source building itself up. I’m not wise sufficient to in fact assist however I can contribute moral support lol
Hello guys, I am even simply searching for code for DeepSeek-V2, in order to fully comprehend multi-head hidden attention. You do not seem to have code in Hugging Face even for that. Or am I missing something? Don’t see anything in src/transformers/models. MLA is not appropriately described in their paper, so it would be necessary to have code for this.