## Mathematics ### Exercise 0.5 (Simple probabilities) There is a round table with 5 seats, and 5 people born on different days. How many ways are there to seat them so that their ages are ordered? ### Exercise 1 (Basic probabilities) Mike has 2 kids. If we know that at least one of them is a boy, what is the probability that the two of them are boys? ### Exercice 1.5 (Basic probabilities) An opaque bag contains 2 black balls and 2 white balls. You randomly draw one without looking at it and put it aside. You then draw a second ball and see that it is black. What is the probability that the first ball is white? ### Exercise 2 (Math and computing knowledge) Give an approximate value of sqrt(2570000). ### Exercise 3 (Math related to data science) Explain what a hyperplane is (in 2 sentences maximum). ### Exercise 4 (Mathematical reasoning) The plane contains a finite number of dots of two different colors (two dots with a different color can be in the same location). Between two points of the same color lies a point of the other color. What can you say about the location of the points? ### Exercise 5 (Probabilities) You have a choice between two games. In both games, a coin is tossed until a certain pair of sides appears successively: heads followed by heads (game 1), or heads followed by tails (game 2). When the pair appears the game stops and you gain 1/n €, where n is the number of coin tosses. For example, if you are playing game 1, the sequence of tosses can be: THTHH and you win 1/5 €. Is there a game which makes you richer on average? ### Exercise 6 (Probabilities) Consider a country where 40% of the inhabitants live in the northern part, with the rest living in the southern part. In the summer, 30% of the northerners leave the country for vacation, but only 15% of the southerners do so. If, outside of this country, you meet an inhabitant from this country, what is the probability that they are a southerner? ### Exercise 7 (Statistics) Let’s say we have an estimate of a conversion rate (number of purchases / number of calls) with a statistical accuracy of 1 %, with 15 000 calls. How many calls do we need in order to have a statistical accuracy ten times better (0.1 %)? ### Exercise 8 Through A/B testing, we get the following results: | | Test group | Control group | | ------------- |:-------------:| -----:| | Group size | 117 415 | 117 284 | | Conversion rate observed | 7.07% | 9.36% | What is the confidence interval of the incremental gain (Test conversion rate - control conversion rate)? ### Exercise 9 (More difficult problem) Arthur chooses a secret polynomial with non-negative integer coefficients. You can give him a real number, and he will give you the value of the polynomial at this number. Find a way of determining the polynomial by asking Arthur for such a value a minimal number of times. [Hint: a polynomial in x with non-negative integer coefficients can represent a number expressed in base x, for x large enough.] ### Exercise 10 (More difficult problem) Put N dots in a circle. Select a starting dot (let’s call it #1). While always going in the same direction from this starting dot: - remove the next remaining dot (at the beginning, this is the dot following the starting dot: dot #2), - move to the next remaining dot (at the beginning, this is dot #3), - iterate (remove, move). At the end, only one point remains: which one? At first, find a way of calculating this quickly enough (and calculate, for instance, the remaining dot, starting with 20 dots). [Hint: recurrence.] Then, find a way to calculate the last remaining dot in one step, that uses binary. ### Exercise 11 (Calculus) Give an approximate value of sqrt(10) with 3 significant digits. Give an approximate value of ln 3 with 2 significant digits. [Hint: Taylor expansion] ## Computer Science ### Exercise 1 (C, C++,…) What is passing arguments by reference? ### Exercise 2 (Python) What concepts are important for argument passing in Python? Or: Python does not have passing by reference, but it does have something similar, what is it? ### Exercise 3 (Databases) In relational databases, what is a primary key? ### Exercise 4 (Time complexity) Given an index, what is the time complexity of retrieving an element in an array? In a Python list? What about deleting this element? Given a key, what is the time complexity of retrieving the associated value from a hash table (which implements an associative array, as is the case for instance in Python)? What about deleting this key? ### Exercise 5 (Time complexity) What is the complexity of a matrix multiplication? ### Exercise 6 (Shells: Unix,…) In a Unix shell, what does the shell pipe "|" do? What does `LD_LIBRARY_PATH=` in the following command do (this is not a question about `LD_LIBRARY_PATH` but a question about the syntax used, what it does, etc.)? `LD_LIBRARY_PATH= awk… ` ### Exercise 7 (More advanced programming) In object-oriented languages, what is polymorphism? ## Python ### Exercise 1 Do objects have a type, in Python? ### Exercise 2 What is a list comprehension? What is a dictionary comprehension? ### Exercise 3 In a Python shell, what does the variable called “_” (underline) contain? ### Exercise 4 (data science) Cite a few plotting libraries in Python. Cite a dataframe manipulation library in Python. Cite a machine learning library in Python. ### Exercise 5 (special character handling) Explain what are the encoding and decoding of characters. What do you know about Unicode? UTF-8? UTF-16? ### Exercise 6 (more advanced) What does the heapq module do? In which situation is it useful? ### Exercise 7 (more advanced) What is a *staticmethod*? What is a *classmethod*? ### Exercise 8 (for pros) In what circumstances is the variable called “_” (underline) typically used in a Python program? ## Machine learning ### Exercice 0 (Basic) Do you normalize variables before doing machine learning? ### Exercise 1 (Model fitting) Explain what overfitting is in machine learning (in 2 sentences maximum). How do you see that a model is overfitting? ### Exercise 2 (Model fitting) What is regularization and what does it do, when fitting a model? ### Exercise 3 (Basic probabilities and fitting) What is the principle behind linear regression? Prove the formula that gives the parameters of the regression. ### Exercise 4 (Non-linear Support Vector Machines) In Support Vector Machines, what is the kernel trick? ### Exercise 5 (More advanced probabilities/modeling) What is the log-loss function: formula, including in the case of more than two classes? In what cases should it be used (instead of, say, a sum of squares error)? ## Business and Insurance ### Exercise 0 What are typically the different steps of the work of a data scientist on a business project? ### Exercise 1 What is an uplift model? ### Exercise 2 How would you test a targeting model in real life? ### Exercise 3 (Big Data/Insurance) How do you think how Big Data can have an impact in the insurance sector? ### Exercise 4 (Insurance) What are the 3 biggest business lines in the insurance sector? ### Exercise 5 (Insurance) What happens if an insurance company gives the same contract price to everybody? ### Exercise 6 (Insurance) How would you price a contract? ### Exercise 7 I have tested the conversion rate of my cross-sell model using an AB test (randomized test). I get the following results: | | Test group |Control group| | ------------- |:-------------:| -----: | |Group size | 117 415 | 117 284 | |Conversion rate observed |7.07%|9.36%| What is the confidence interval of the incremental gain (TEST conversion rate - CONTROL conversion rate)?