 AI大模型千问 qwen 中文文档device EMBEDDING_DEVICE = "cuda" # return top-k text chunk from vector store VECTOR_SEARCH_TOP_K = 3 CHAIN_TYPE = 'stuff' embedding_model_dict = { "text2vec": "your text2vec model path", } llm = Qwen() embeddings input_variables=["context_str", "question"] ) chain_type_kwargs = {"prompt": prompt, "document_variable_name": "context_str"} qa = RetrievalQA.from_chain_type( llm=llm, chain_type=CHAIN_TYPE, retriever=docsearch.as_r as_retriever(search_kwargs={"k": VECTOR_SEARCH_TOP_K}), chain_type_kwargs=chain_type_kwargs) query = "Give me a short introduction to large language model." (续下页) 48 Chapter 1. 文档 Qwen (接上页) print(qa0 码力 | 56 页 | 835.78 KB | 1 年前3 AI大模型千问 qwen 中文文档device EMBEDDING_DEVICE = "cuda" # return top-k text chunk from vector store VECTOR_SEARCH_TOP_K = 3 CHAIN_TYPE = 'stuff' embedding_model_dict = { "text2vec": "your text2vec model path", } llm = Qwen() embeddings input_variables=["context_str", "question"] ) chain_type_kwargs = {"prompt": prompt, "document_variable_name": "context_str"} qa = RetrievalQA.from_chain_type( llm=llm, chain_type=CHAIN_TYPE, retriever=docsearch.as_r as_retriever(search_kwargs={"k": VECTOR_SEARCH_TOP_K}), chain_type_kwargs=chain_type_kwargs) query = "Give me a short introduction to large language model." (续下页) 48 Chapter 1. 文档 Qwen (接上页) print(qa0 码力 | 56 页 | 835.78 KB | 1 年前3
 深度学习与PyTorch入门实战 - 20. 链式法则▪ e.g. Softmax Chain rule ▪ ?? ?? = ?? ?? ?? ?? ▪ ?2 = ?1?2 + ?2 ▪ ?1 = ??1 + ?1 ▪ ??2 ??1 = ??(?1) ??1 = ??(?1) ?y1 ??1 ??1 = ?2 ∗ ? ▪ ?2 = (??1 + ?1) ∗ w2 + b2 Chain rule ▪ ?? ???0 码力 | 10 页 | 610.60 KB | 1 年前3 深度学习与PyTorch入门实战 - 20. 链式法则▪ e.g. Softmax Chain rule ▪ ?? ?? = ?? ?? ?? ?? ▪ ?2 = ?1?2 + ?2 ▪ ?1 = ??1 + ?1 ▪ ??2 ??1 = ??(?1) ??1 = ??(?1) ?y1 ??1 ??1 = ?2 ∗ ? ▪ ?2 = (??1 + ?1) ∗ w2 + b2 Chain rule ▪ ?? ???0 码力 | 10 页 | 610.60 KB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesof the association for computational linguistics: Human language technologies. 2011. mechanism to chain multiple augmentations. It can be replaced with any other library per individual preference. %%capture [nltk_download(item) for item in ['punkt', 'wordnet']] aug_args = dict(aug_p=0.3, aug_max=40) chain = [ nas.random.RandomSentAug(**aug_args), naw.RandomWordAug(action='delete', **aug_args), naw. naw.RandomWordAug(action='substitute', **aug_args), nac.KeyboardAug() ] flow = naf.Sometimes(chain, pipeline_p=0.3) The nlpaug_fn() function just wraps up the augmentation calls in a tf.py_function0 码力 | 56 页 | 18.93 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 3 - Learning Techniquesof the association for computational linguistics: Human language technologies. 2011. mechanism to chain multiple augmentations. It can be replaced with any other library per individual preference. %%capture [nltk_download(item) for item in ['punkt', 'wordnet']] aug_args = dict(aug_p=0.3, aug_max=40) chain = [ nas.random.RandomSentAug(**aug_args), naw.RandomWordAug(action='delete', **aug_args), naw. naw.RandomWordAug(action='substitute', **aug_args), nac.KeyboardAug() ] flow = naf.Sometimes(chain, pipeline_p=0.3) The nlpaug_fn() function just wraps up the augmentation calls in a tf.py_function0 码力 | 56 页 | 18.93 MB | 1 年前3
 Machine Learningwith a directed acyclic graph describing how the functions are composed together • E.g., we use a chain to represent f(x) = f3(f2(f1(x))) • If we take sigmod function as the activation function • z1 = Fundamental Equations • Proof: Rewrite δ[l] j = ∂L/∂z[l] j in terms of δ[l] k = ∂L/∂z[l+1] k • By the chain rule, δ[l] j = ∂L ∂z[l] j = � k ∂L ∂z[l+1] k ∂z[l+1] k ∂z[l] j = � k ∂z[l+1] k ∂z[l] j δ[l+1]0 码力 | 19 页 | 944.40 KB | 1 年前3 Machine Learningwith a directed acyclic graph describing how the functions are composed together • E.g., we use a chain to represent f(x) = f3(f2(f1(x))) • If we take sigmod function as the activation function • z1 = Fundamental Equations • Proof: Rewrite δ[l] j = ∂L/∂z[l] j in terms of δ[l] k = ∂L/∂z[l+1] k • By the chain rule, δ[l] j = ∂L ∂z[l] j = � k ∂L ∂z[l+1] k ∂z[l+1] k ∂z[l] j = � k ∂z[l+1] k ∂z[l] j δ[l+1]0 码力 | 19 页 | 944.40 KB | 1 年前3
 《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewweight of the i-th layer, , which is the gradient for that layer’s weight. Let’s start by using the chain rule, to compute the partial derivative of the loss function with respect to as follows: And from , we can calculate which is simply . More generally, we can calculate , and from that using the chain rule again. As you can see, if the network has a large number of layers and the weights25 have small0 码力 | 31 页 | 4.03 MB | 1 年前3 《Efficient Deep Learning Book》[EDL] Chapter 6 - Advanced Learning Techniques - Technical Reviewweight of the i-th layer, , which is the gradient for that layer’s weight. Let’s start by using the chain rule, to compute the partial derivative of the loss function with respect to as follows: And from , we can calculate which is simply . More generally, we can calculate , and from that using the chain rule again. As you can see, if the network has a large number of layers and the weights25 have small0 码力 | 31 页 | 4.03 MB | 1 年前3
 Lecture 5: Gaussian Discriminant Analysis, Naive Bayesis true given event B is true P(A | B) = P(A, B) P(B) , P(A, B) = P(A | B)P(B) Corollary: The chain rule P (A1, A2, · · · , Ak) = n � k=1 P (Ak | A1, A2, · · · , Ak−1) Example: P(A4, A3, A2, A1) y(0), z(0)) = q Suppose h(t) = f (x(t), y(t), z(t)) such that h(t) has a maximum at t = 0 By the chain rule h′(t) = ∇f |r(t) ·r′(t) Since t = 0 is a local maximum, we have h′(0) = ∇f |q ·r′(0) = 0 ∇f0 码力 | 122 页 | 1.35 MB | 1 年前3 Lecture 5: Gaussian Discriminant Analysis, Naive Bayesis true given event B is true P(A | B) = P(A, B) P(B) , P(A, B) = P(A | B)P(B) Corollary: The chain rule P (A1, A2, · · · , Ak) = n � k=1 P (Ak | A1, A2, · · · , Ak−1) Example: P(A4, A3, A2, A1) y(0), z(0)) = q Suppose h(t) = f (x(t), y(t), z(t)) such that h(t) has a maximum at t = 0 By the chain rule h′(t) = ∇f |r(t) ·r′(t) Since t = 0 is a local maximum, we have h′(0) = ∇f |q ·r′(0) = 0 ∇f0 码力 | 122 页 | 1.35 MB | 1 年前3
 深度学习与PyTorch入门实战 - 21. MLP反向传播推导MLP反向传播 主讲人:龙良曲 Chain rule ▪ ?? ???? ? = ?? ??? 1 ??? 1 ?? = ?? ??? 2 ??? 2 ??? 1 ??? 1 ?? ∑ E ?? ∑ ??? ? ??? ? ?? ? ?? ? Multi-output Perceptron ∑ σ E ?0 0 ?1 0 ?2 00 码力 | 15 页 | 940.28 KB | 1 年前3 深度学习与PyTorch入门实战 - 21. MLP反向传播推导MLP反向传播 主讲人:龙良曲 Chain rule ▪ ?? ???? ? = ?? ??? 1 ??? 1 ?? = ?? ??? 2 ??? 2 ??? 1 ??? 1 ?? ∑ E ?? ∑ ??? ? ??? ? ?? ? ?? ? Multi-output Perceptron ∑ σ E ?0 0 ?1 0 ?2 00 码力 | 15 页 | 940.28 KB | 1 年前3
 人工智能发展史ctures/backprop_old.pdf NO! Multi-Layer Perceptron is coming ▪ New Issue: How to train MLP ▪ Chain Rules => Backpropagation http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf0 码力 | 54 页 | 3.87 MB | 1 年前3 人工智能发展史ctures/backprop_old.pdf NO! Multi-Layer Perceptron is coming ▪ New Issue: How to train MLP ▪ Chain Rules => Backpropagation http://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf0 码力 | 54 页 | 3.87 MB | 1 年前3
 PyTorch TutorialAutograd • Automatic Differentiation Package • Don’t need to worry about partial differentiation, chain rule etc.. • backward() does that • loss.backward() • Gradients are accumulated for each step by0 码力 | 38 页 | 4.09 MB | 1 年前3 PyTorch TutorialAutograd • Automatic Differentiation Package • Don’t need to worry about partial differentiation, chain rule etc.. • backward() does that • loss.backward() • Gradients are accumulated for each step by0 码力 | 38 页 | 4.09 MB | 1 年前3
 Lecture 2: Linear Regression= lim h→0 g(h) − g(0) h = lim h→0 f (x + hu) − g(0) h = ∇uf (x) (1) On the other hand, by the chain rule, g′(h) = n � i=1 f ′ i (x) d dh(xi + hui) = n � i=1 f ′ i (x)ui (2) Let h = 0, then g′(0)0 码力 | 31 页 | 608.38 KB | 1 年前3 Lecture 2: Linear Regression= lim h→0 g(h) − g(0) h = lim h→0 f (x + hu) − g(0) h = ∇uf (x) (1) On the other hand, by the chain rule, g′(h) = n � i=1 f ′ i (x) d dh(xi + hui) = n � i=1 f ′ i (x)ui (2) Let h = 0, then g′(0)0 码力 | 31 页 | 608.38 KB | 1 年前3
共 11 条
- 1
- 2













