1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | ## Mathematics ### Exercise 0.5 (Simple probabilities) There is a round table with 5 seats, and 5 people born on different days. How many ways are there to seat them so that their ages are ordered? ### Exercise 1 (Basic probabilities) Mike has 2 kids. If we know that at least one of them is a boy, what is the probability that the two of them are boys? ### Exercice 1.5 (Basic probabilities) An opaque bag contains 2 black balls and 2 white balls. You randomly draw one without looking at it and put it aside. You then draw a second ball and see that it is black. What is the probability that the first ball is white? ### Exercise 2 (Math and computing knowledge) Give an approximate value of sqrt(2570000). ### Exercise 3 (Math related to data science) Explain what a hyperplane is (in 2 sentences maximum). ### Exercise 4 (Mathematical reasoning) The plane contains a finite number of dots of two different colors (two dots with a different color can be in the same location). Between two points of the same color lies a point of the other color. What can you say about the location of the points? ### Exercise 5 (Probabilities) You have a choice between two games. In both games, a coin is tossed until a certain pair of sides appears successively: heads followed by heads (game 1), or heads followed by tails (game 2). When the pair appears the game stops and you gain 1/n €, where n is the number of coin tosses. For example, if you are playing game 1, the sequence of tosses can be: THTHH and you win 1/5 €. Is there a game which makes you richer on average? ### Exercise 6 (Probabilities) Consider a country where 40% of the inhabitants live in the northern part, with the rest living in the southern part. In the summer, 30% of the northerners leave the country for vacation, but only 15% of the southerners do so. If, outside of this country, you meet an inhabitant from this country, what is the probability that they are a southerner? ### Exercise 7 (Statistics) Let’s say we have an estimate of a conversion rate (number of purchases / number of calls) with a statistical accuracy of 1 %, with 15 000 calls. How many calls do we need in order to have a statistical accuracy ten times better (0.1 %)? ### Exercise 8 Through A/B testing, we get the following results: | | Test group | Control group | | ------------- |:-------------:| -----:| | Group size | 117 415 | 117 284 | | Conversion rate observed | 7.07% | 9.36% | What is the confidence interval of the incremental gain (Test conversion rate - control conversion rate)? ### Exercise 9 (More difficult problem) Arthur chooses a secret polynomial with non-negative integer coefficients. You can give him a real number, and he will give you the value of the polynomial at this number. Find a way of determining the polynomial by asking Arthur for such a value a minimal number of times. [Hint: a polynomial in x with non-negative integer coefficients can represent a number expressed in base x, for x large enough.] ### Exercise 10 (More difficult problem) Put N dots in a circle. Select a starting dot (let’s call it #1). While always going in the same direction from this starting dot: - remove the next remaining dot (at the beginning, this is the dot following the starting dot: dot #2), - move to the next remaining dot (at the beginning, this is dot #3), - iterate (remove, move). At the end, only one point remains: which one? At first, find a way of calculating this quickly enough (and calculate, for instance, the remaining dot, starting with 20 dots). [Hint: recurrence.] Then, find a way to calculate the last remaining dot in one step, that uses binary. ### Exercise 11 (Calculus) Give an approximate value of sqrt(10) with 3 significant digits. Give an approximate value of ln 3 with 2 significant digits. [Hint: Taylor expansion] ## Computer Science ### Exercise 1 (C, C++,…) What is passing arguments by reference? ### Exercise 2 (Python) What concepts are important for argument passing in Python? Or: Python does not have passing by reference, but it does have something similar, what is it? ### Exercise 3 (Databases) In relational databases, what is a primary key? ### Exercise 4 (Time complexity) Given an index, what is the time complexity of retrieving an element in an array? In a Python list? What about deleting this element? Given a key, what is the time complexity of retrieving the associated value from a hash table (which implements an associative array, as is the case for instance in Python)? What about deleting this key? ### Exercise 5 (Time complexity) What is the complexity of a matrix multiplication? ### Exercise 6 (Shells: Unix,…) In a Unix shell, what does the shell pipe "|" do? What does `LD_LIBRARY_PATH=` in the following command do (this is not a question about `LD_LIBRARY_PATH` but a question about the syntax used, what it does, etc.)? `LD_LIBRARY_PATH= awk… ` ### Exercise 7 (More advanced programming) In object-oriented languages, what is polymorphism? ## Python ### Exercise 1 Do objects have a type, in Python? ### Exercise 2 What is a list comprehension? What is a dictionary comprehension? ### Exercise 3 In a Python shell, what does the variable called “_” (underline) contain? ### Exercise 4 (data science) Cite a few plotting libraries in Python. Cite a dataframe manipulation library in Python. Cite a machine learning library in Python. ### Exercise 5 (special character handling) Explain what are the encoding and decoding of characters. What do you know about Unicode? UTF-8? UTF-16? ### Exercise 6 (more advanced) What does the heapq module do? In which situation is it useful? ### Exercise 7 (more advanced) What is a *staticmethod*? What is a *classmethod*? ### Exercise 8 (for pros) In what circumstances is the variable called “_” (underline) typically used in a Python program? ## Machine learning ### Exercice 0 (Basic) Do you normalize variables before doing machine learning? ### Exercise 1 (Model fitting) Explain what overfitting is in machine learning (in 2 sentences maximum). How do you see that a model is overfitting? ### Exercise 2 (Model fitting) What is regularization and what does it do, when fitting a model? ### Exercise 3 (Basic probabilities and fitting) What is the principle behind linear regression? Prove the formula that gives the parameters of the regression. ### Exercise 4 (Non-linear Support Vector Machines) In Support Vector Machines, what is the kernel trick? ### Exercise 5 (More advanced probabilities/modeling) What is the log-loss function: formula, including in the case of more than two classes? In what cases should it be used (instead of, say, a sum of squares error)? ## Business and Insurance ### Exercise 0 What are typically the different steps of the work of a data scientist on a business project? ### Exercise 1 What is an uplift model? ### Exercise 2 How would you test a targeting model in real life? ### Exercise 3 (Big Data/Insurance) How do you think how Big Data can have an impact in the insurance sector? ### Exercise 4 (Insurance) What are the 3 biggest business lines in the insurance sector? ### Exercise 5 (Insurance) What happens if an insurance company gives the same contract price to everybody? ### Exercise 6 (Insurance) How would you price a contract? ### Exercise 7 I have tested the conversion rate of my cross-sell model using an AB test (randomized test). I get the following results: | | Test group |Control group| | ------------- |:-------------:| -----: | |Group size | 117 415 | 117 284 | |Conversion rate observed |7.07%|9.36%| What is the confidence interval of the incremental gain (TEST conversion rate - CONTROL conversion rate)? |